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(54) Method and apparatus for special video reproduction modes 



(57) A special reproduction control information corn- 
prises plurality of items (1 00) of frame information . Each 
of the items of frame information comprises video loca- 



tion information (1 01 ) indicating the location of video da- 
ta to be reproduced in a special reproduction and display 
time control information (102) indicating the time for dis- 
playing the video data. 
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Description 

[0001] The present invention relates to a special re- 
production control information describing method for de- 
scribing special reproduction control information used 
to perform special reproduction for target video con- 
tents, a special reproduction control information creat- 
ing method for creating the special reproduction control 
information and a special reproduction control informa- 
tion creating apparatus and a video reproduction appa- 
ratus and method for performing special reproduction 
by using the special reproduction control information. 
[0002] In recent years, a motion picture is com- 
pressed as a digital video and is stored in disk media 
represented by a DVD, and a HDD so that a video can 
be reproduced at random. A video can be reproduced 
halfway from a desired timing in the state of virtually no 
waiting time. As in conventional tape media, disk media 
can be fast reproduced at two to four times speed or can 
be reversely reproduced. 

[0003] However, there is a problem in that the length 
of a video can be very long in many cases, and time 
cannot be sufficiently compressed to view the whole 
contents of the video even at two to four times fast re- 
production. When the rate of the fast reproduction is in- 
creased, the scene change is enlarged to a degree ex- 
ceeding the ability to view it, so that grasping the con- 
tents is difficult, and even portions which are not needed 
are also reproduced so that waste is caused. 
[0004] Accordingly, the present invention is directed 
to method and apparatus that substantially obviates one 
or more of the problems due to limitations and disad- 
vantages of the related art. 

[0005] According to one aspect of the present inven- 
tion, a method of describing frame information compris- 
es: 

describing, for a frame extracted from a plurality of 
frames in a source video data, first information 
specifying a location of the extracted frame in the 
source video data; and 

describing, for the extracted frame, second informa- 
tion relating to a display time of the extracted frame. 

[0006] According to another aspect of the present in- 
vention, an article of manufacture comprising a compu- 
ter usable medium storing frame information, the frame 
information comprises: 

first information, described for a frame extracted 
from a plurality of frames, specifying a location of 
the extracted frame in the source video data; and 
second information, described for the extracted 
frame, relating to a display time of the extracted 
frame. 

[0007] According to another aspect of the present in- 
vention, an apparatus for creating frame information 



comprises: 

a unit configured to extract a frame from a plurality 
of frames in a source video data; 

5 a unit configured to create the frame information in- 

cluding first information specifying a location of the 
extracted frame and second information relating to 
a display time of the extracted frame; and 
a unit configured to link the extracted frame to the 

10 frame information. 

[0008] According to another aspect of the present in- 
vention, a method of creating frame information com- 
prises: 

15 

extracting a frame from a plurality of frames in a 
source video data; and 

creating the frame information including first infor- 
mation specifying a location of the extracted frame" 
20 in the source video data and second information re- 
lating to a disjalay time of the extracted frame. 

[0009] According to another aspect of the present in- 
vention, an apparatus for performing a special reproduc- 
es tion comprises: 

a unit configured to refer to frame information de- 
scribed for a frame extracted from a plurality of 
frames in a source video data and including first in- 

30 formation specifying a location of the extracted 
frame in the source video data and second informa- 
tion relating to a display time of the extracted frame; 
a unit configured to obtain the video data corre- 
sponding to the extracted frame based on the first 

35 information; 

a unit configured to determine the display time of 
the extracted frame based on the second informa- 
tion; and 

a unit configured to display the obtained video data 
40 for the determined display time. 

[0010] According to another aspect of the present in- 
vention, an article of manufacture comprising a method 
of performing a special reproduction comprises: 

45 

referring to frame information described for a frame 
extracted from a plurality of frames in a source video 
data and including first information specifying a lo- 
cation of the extracted frame and second informa- 

so tion relating to a display time of the extracted frame; 

obtaining the video data corresponding to the ex- 
tracted frame based on the first information; 
determining the display time of the extracted frame 
based on the second information; and 

55 displaying the obtained video data for the deter- 
mined display time. 

[001 1 ] According to another aspect of the present in- 
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vention, an article of manufacture comprising an article 
of manufacture comprising a computer usable medium 
having computer readable program code means em- 
bodied therein, the computer readable program code 
means performing a special reproduction, the computer 
readable program code means comprises: 

computer readable program code means for caus- 
ing a computer to refer to frame information de- 
scribed for a frame extracted from a plurality of 
frames in a source video data and including first in- 
formation specifying a location of the extracted 
frame and second information relating to a display 
time of the extracted frame; 
computer readable program code means for caus- 
ing a computer to obtain the video data correspond- 
ing to the extracted frame based on the first infor- 
mation; 

computer readable program code means for caus- 
ing a computer to determine the display time of the 
extracted frame based on the second information; 
and 

computer readable program code means for caus- 
ing a computer to display the obtained video data 
for the determined display time. 

[0012] According to another aspect of the present in- 
vention, an article of manufacture comprising a method 
of describing sound information, the method comprises: 

describing, for a frame extracted from a plurality of 
sound frames in a source sound data, first informa- 
tion specifying a location of the extracted frame in 
the source sound data; and 
describing, forthe extracted frame, second informa- 
tion relating to a reproduction start time and repro- 
duction time of the sound data of the extracted 
frame. 

[0013] According to another aspect of the present in- 
vention, an article of manufacture comprising an article 
of manufacture comprising a computer usable medium 
storing frame information, the frame information com- 
prises: 

first Information, described for a frame extracted 
from a plurality of sound frames, specifying a loca- 
tion of the extracted frame in the source sound data; 
and 

second information, described for the extracted 
frame, relating to a reproduction start time and re- 
production time of the sound data of the extracted 
frame. 

[0014] According to another aspect of the present in- 
vention, an article of manufacture comprising a method 
of describing text information, the method comprises: 



4 

describing, for a frame extracted from a plurality of 
text frames in a source text data, first information 
specifying a location of the extracted frame in the 
source text data; and 
5 describing, forthe extracted frame, second informa- 

tion relating to a display start time and display time 
of the text data of the extracted frame. 

[0015] According to another aspect of the present in- 
10 vention, an article of manufacture comprising an article 
of manufacture comprising a computer usable medium 
storing frame information, the frame information com- 
prises: 

first information, described for a frame extracted 
from a plurality of text frames in a source text data, 
specifying a location of the extracted frame in the 
source text data; and 

second information, described for the extracted 
frame, relating to a display start time and display 
time of the tex^data of the extracted frame. 

[0016] This summary of the invention does not nec- 
essarily describe all necessary features so that the in- 
vention may also be a sub-combination of these de- 
scribed features. 

[0017] The present invention can be implemented ei- 
ther in hardware or on software in a general purpose 
computer. Further the present invention can be imple- 
mented in a combination of hardware and software. The 
present invention can also be implemented by a single 
processing apparatus or a distributed network of 
processing apparatuses. 

[0018] Since the present invention can be implement- 
ed by software, the present invention encompasses 
computer code provided to a general purpose computer 
on any suitable carrier medium. The carrier medium can 
comprise any storage medium such as a floppy disk, a 
CD ROM, a magnetic device or a programmable mem- 
ory device, or any transient medium such as any signal 
e.g. an electrical, optical or microwave signal. 
[0019] The invention can be more fully understood 
from the following detailed description when taken in 
conjunction with the accompanying drawings, in which: 

FIG . 1 is a view showi ng an example of a data struc- 
ture of special reproduction control information ac- 
cording to one embodiment of the present inven- 
tion; 

50 FIG. 2 is a view showing an example of a structure 
of a special reproduction control information creat- 
ing apparatus; 

FIG. 3 is a view showing another example of struc- 
ture of the special reproduction control information 
55 creating apparatus; 

FIG. 4 is a flowchart showing one example for the 
apparatus shown in FIG. 2; 

FIG. 5 is a flowchart showing one example for the 



EP 1 168 840 A2 



25 



30 



35 



40 



45 



BNSDOCID: <EP 116884QA2J_> 



5 



EP1 168 840 A2 



6 



apparatus shown in FIG. 3; 

FIG. 6 is a view showing an example of a structure 
of a video reproduction apparatus; 
FIG. 7 is a flowchart showing one example for the 
apparatus shown in FIG. 6; 5 
FIG. 8 is a view showing an example of a data struc- 
ture of special reproduction control information; 
FIG. 9 is a view explaining video location informa- 
tion for referring to an original video frame; 
FIG. 10 is a view explaining video location informa- 10 
tion for referring to a image data file; 
FIG. 11 is a view explaining a method for extracting 
video data in accordance with a motion of a screen; 
FIG. 12 is a view explaining video location informa- 
tion for referring to the original video frame; *5 
FIG. 13 is a view for explaining video location infor- 
mation for referring to the image data file; 
FIG. 14 Is a view showing an example of a data 
structure of special reproduction control information 
in which plural original video frames are referred to; 20 
FIG. 15 is a view explaining a relation between the 
video location information and the original plural 
video frames; 

FIG. 16 is a view explaining a relation between the 
image data file and the original plural video frames; 25 
FIG. 1 7 is a view explaining video location informa- 
tion for referring to the original video frame; 
FIG. 1 8 is a view for explaining video location infor- 
mation for referring to the image data file; 
FIG. 19 is a flow chart for explaining a special re- so 
production; 

FIG. 20 is a view for explaining a method for extract- 
ing video data in accordance with a motion of a 
screen; 

FIG. 21 is a viewfor explaining a method for extract- 35 
ing video data in accordance with a motion of a 
screen; 

FIG. 22 is a flowchart showing one example for cal- 
culating display time at which a scene change quan- 
tity becomes constant as much as possible; 40 
FIG. 23 is a flowchart showing one example for cal- 
culating a scene change quantity of the whole frame 
from an MPEG video; 

FIG. 24 is a view for explaining a method for calcu- 
lating a scene change quantity of a video from an 45 
MPEG stream; 

FIG. 25 is a viewfor exp'aining a processing proce- 
dure for calculating display time at which a scene 
change quantity becomes constant as much as pos- 
sible; so 
FIG. 26 is a flowchart showing one example of the 
processing procedure for conducting special repro- 
duction on the basis of special reproduction control 
information; 

FIG. 27 is a flowchart showing one example for con- 55 
ducting special reproduction on the basis of a dis- 
play cycle; 

FIG. 28 is a view for explaining a relationship be- 



tween a calculated display time and the display cy- 
cle; 

FIG. 29 is a view for explaining a relationship be- 
tween a calculated display time and the display cy- 
cle; 

FIG . 30 is a view showing another example of a data 
structure of special reproduction control informa- 
tion; 

FIG. 31 is a view explaining a method for extracting 
video data in accordance with a motion of a screen; 
FIG. 32 is a view explaining video location informa- 
tion for referring to the original video frame; 
FIG . 33 is a view showing another example of a data 
structure of special reproduction control informa- 
tion; 

FIG . 34 is a view showing another example of a data 
structure of special reproduction control informa- 
tion; 

FIG. 35 is a view showing another example of a data 
structure of special reproduction control informa- 
tion; _ 

FIG . 36 is a flowchart showing one example for cal- 
culating display time from the importance; 
FIG. 37 is a view for explaining a method for calcu- 
lating display time from the importance; 
FIG. 38 is a flowchart showing one example for cal- 
culating importance data on the basis of the idea 
that a scene having a large sound level is important; 
FIG . 39 is a flowchart showing one example for cal- 
culating importance data on the basis of the idea 
that a scene on which many important words appear 
with sound recognition is important, or a processing 
procedure for calculating importance data on the 
basis of the idea that the scene in which the number 
of words talked per time is many is important; 
FIG. 40 is a flowchart showing one example for cal- 
culating importance data on the basis of the idea 
that a scene on which many important words appear 
with telop recognition is important, or a processing 
procedure for calculating importance data on the 
basis of the idea that the scene in which the number 
of words included in the telop which appears per 
time is large with telop recognition is important; 
FIG. 41 is a flowchart showing one example for cal- 
culating importance data on the basis of the idea 
that the scene in which a large character appears 
as a telop is important; 

FIG. 42 is a flowchart showing one example for cal- 
culating importance data on the basis of the idea 
that the scene in which many human faces appear 
is important or a processing for calculating impor- 
tance data on the basis of the idea that the scene 
where human faces are displayed in an enlarged 
manner is important; 

FIG. 43 is a flowchart showing one example for cal- 
culating importance data on the basis of the idea 
that the scene in which videos similar to the regis- 
tered important scene appear is important; 
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FIG. 44 is a view showing another example of a data 
structure of special reproduction control informa- 
tion; 

FIG. 45 is a view showing another example of a data 
structure of special reproduction control informa- 
tion; 

FIG. 46 is a view showing another example of a data 
structure of special reproduction control informa- 
tion: 

FIG. 47 is a view for explaining a relationship be- 
tween information as to whether the scene is to be 
reproduced or not and the reproduced video; 
FIG. 48 is a flowchart showing one example of a 
processing procedure of special reproduction in- 
cluding reproduction and non-reproduction judg- 
ment; 

FIG. 49 is a view showing one example of a data 
structure when sound information or text informa- 
tion is added; 

FIG. 50 is a view showing one example of a data 
structure for describing only sound information sep- 
arately from frame information; 
FIG. 51 is a view showing one example of a data 
structure for describing only text information sepa- 
rately from frame information; 
FIG. 52 is a view for explaining a synchronization 
of a reproduction of each of media; 
FIG. 53 is a flowchart showing one example of a 
determination procedure of a sound reproduction 
start time and a sound reproduction time in a video 
frame section; 

FIG. 54 is a flowchart showing one example for pre- 
paring reproduction sound data and correcting vid- 
eo frame display time; 

FIG. 55 is a flowchart showing one example of a 
processing procedure of obtaining text information 
with telop recognition; 

FIG. 56 is a flowchart showing one example of a 
processing procedure of obtaining text information 
with sound recognition; 
~ FIG. 57 is a flowchart showing one example of a 
processing procedure of preparing text information; 
FIGS. 58A and 58B are views for explaining a meth- 
od of displaying text information; 
FIG. 59 Is a view showing one example of a data 
structure of special reproduction control information 
for sound information; 

FIG. 60 is a view showing another example of a data 
structure of special reproduction control information 
for sound information; 

FIG. 61 is a view explaining a summary reproduc- 
tion of the sound/music data; and 
FIG. 62 is a view explaining another summary re- 
production of the sound/music data. 

[0020] Preferred embodiments of the present inven- 
tion will now be described with reference to the accom- 
panying drawings. 
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[0021] The embodiments relate to a reproduction of 
video contents having video data using special repro- 
duction control information. The video data comprises 
a set of video frames (video frame group) constituting a 
5 motion picture. 

[0022] The special reproduction control information is 
created from the video data by a special reproduction 
control information creating apparatus and attached to 
the video data. The special reproduction is reproduction 
10 by a method other than a normal reproduction. The spe- 
cial reproduction includes a double speed reproduction 
(or a high speed reproduction), jump reproduction (or 
jump continuous reproduction), and a trick reproduction. 
The trick reproduction includes a substituted reproduc- 
es tion, an overlapped reproduction, a slow reproduction 
and the like. The special reproduction control informa- 
tion is referred to when the special reproduction is exe- 
cuted in the video reproduction apparatus. 
[0023] FIG. 1 shows one example of a basic data 
20 structure of the special reproduction control information. 
[0024] In this data structure, plural items of frame in- 
formation "i" (i = 1 to N) are described in correspondence 
to the frame appearance order in the video data. Each 
frame information 100 includes a set of video location 
25 information 101 and display time control information 
102. The video location information 101 indicates a lo- 
cation of video data to be displayed at the time of special 
reproduction. The video data to be display may be one 
frame, a group of a plurality of continuous frames, or a 
30 group formed of a part of a plurality of continuous 
frames. The display time control information 102 forms 
the basis of calculating the display time of the video da- 
ta. 

[0025] In FIG. 1 , the frame information V is arranged 
35 in an order of the appearance of frames in the video da- 
ta. When information indicating an order of frame infor- 
mation is described in the frame information V, the 
frame information "i" may be arranged and described in 
any order. 

40 [0026] The reproduction rate information 103 at- 
tached to a plurality of items of frame information "i" 
shows the reproduction speed rate and is used for des- 
ignating the reproduction at a speed several times high- 
er than that corresponding to the display time as de- 

45 scribed by the display time control Information 102. 
However, the reproduction rate information 103 is not 
essential information. The information 103 may con- 
stantly be attached, not constantly be attached, or se- 
lectively attached. Even when the reproduction rate in- 

50 formation 103 is attached, the information may not be 
used at the time of special reproduction. The reproduc- 
tion rate information may constantly be used, may not 
constantly used, or is selectively used. 
[0027] In FIG. 1, it is possible to further add other con- 

55 trol information to the frame information group together 
with the reproduction rate information or in place of the 
reproduction rate information. In FIG. 1 , it is also possi- 
ble to add different control information to each frame in- 
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formation T. In these cases, each information included 

in the special reproduction control information may be 

all used on the side of the video reproduction device, or 

a part of the information may be used. 

[0028] FIG. 2 shows an example of a structure of an 

apparatus for creating special reproduction control inr 

formation. 

[0029] This special reproduction control information 
creating device comprises a video data storage unit 2, 
a video data processing unit 1 including a video location 
information processing unit 11 and a display time control 
information processing unit 12, and a special reproduc- 
tion control information storage unit 3. In detail, as will 
be described later, since the video data (encoded data) 
is decoded to be video data before displaying, it takes 
a processing time required for decoding the video data 
from the display instruction is issued until the video is 
displayed. In order to extracted this processing time, it 
is proposed to decode the video data beforehand and 
store an image data file. 

[0030] If an image data file is used (the image data 
file may be constantly used, or the image data file is se- 
lectively used), an image data file creating unit 13 (in 
the video data processing unit 1 ) and an image data file 
storage unit 14 are further provided as shown in FIG. 3. 
If other control information is added which is determined 
on the basis of the video data to the special reproduction 
control information, the corresponding function is appro- 
priately added to the inside of the video data processing 
unit 1. 

[0031] If an operation by a user is intervened in this 
processing, a GUI is used for displaying, for example, 
video data in frame units, and providing a function of 
receiving an input of an instruction by the user though 
omitted in FIGS. 2 and 3. 

[0032] In FIGS. 2 and 3, a CPU, a memory, an exter- 
nal storage device, and a network communication de- 
vice is provided when needed, and software such as 
driver software used when needed and an OS are not 
shown. 

[0033] The video data storage unit 2 stores video data 
which becomes an target of processing for creating spe- 
cial reproduction control information (or special repro- 
duction control information and image data files). 
[0034] The special reproduction control Information 
storage unit 3 stores special reproduction control infor- 
mation that has been created. 

[0035] The image data file storage unit 4 stores image 
data files that have been created. 
[0036] The storage units 2, 3, and 4 comprise, for ex- 
ample, a hard disk, an optical disk and a semiconductor 
memory. The storage units 2, 3, and 4 may comprise 
separate storage devices. All or part of the storage units 
may comprise the same storage device. 
[0037] The video data processing unit 1 creates the 
special reproduction control information (or the special 
reproduction control information and image data file) on 
the basis of the video data which becomes an target of 



processing. 

[0038] The video location information processing unit 
11 determines (extracts) a video frame (group) which 
should be displayed or which can be displayed at the 

5 time of special reproduction to conduct processing of 
preparing the video location information 101 which 
should be described in each frame information n i". 
[0039] The display time control information process- 
ing unit 1 02 conducts a processing for preparing the dis- 

io play time control information 1 02 associated with the 
display time of the video frame (group) associated with 
each frame information M i". 

[0040] The image data file creating unit 13 conducts 
a processing for preparing an image data file from the 

15 video data. 

[0041] The special reproduction control information 
creating apparatus can be realized, for example, in a 
form of conducting software on a computer. The appa- 
ratus may be realized as a dedicated apparatus for cre- 

20 ating the special reproduction control information. 

[0042] FIG. 4 shfiws an example of a processing pro- 
cedure in a case of a structure of FIG. 2. The video data 
is read (step S11), video location information 1 01 is cre- 
ated (step S12), display time control information 102 is 

25 created (step S13), and special reproduction control in- 
formation is stored (step S14). The procedure of FIG. 4 
may be consecutively conducted for each frame infor- 
mation, and each processing may be conducted in 
batches. The other procedures can also be conducted. 

30 [0043] FIG. 5 shows an example of a processing pro- 
cedure in a case of the structure of FIG. 3. A procedure 
for preparing and storing image data files is added to a 
procedure of FIG. 4 (step S22). The image data file is 
created and/or stored together with the preparation of 

35 the video location information 1 01 . It is also possible to 
create the video location information 1 01 at a timing dif- 
ferent from that of FIG. 4. In the same manner as the 
case of FIG. 4, the procedure of FIG. 5 may be conduct- 
ed for each frame information, or may be conducted in 
batches. The other procedures can also be conducted. 
[0044] FIG. 6 shows an example of a video reproduc- 
tion apparatus. 

[0045] This video reproduction apparatus comprises 
a controller 21 , a normal reproduction processing unit 

45 22, a special reproduction processing unit 23, a display 
device 24, and a contents storage unit 25. If contents 
are handled wherein audio such as sound or the like is 
added to the video data, it is preferable to provide a 
sound output section. If contents are handled wherein 

so text data is added to the video data, the text may be 
displayed on the display device 24, or may be output 
from the sound output section. If contents are handled 
wherein a program is attached, an attached program ex- 
ecution section may be provided. 

55 [0046] The contents storage unit 25 stores at least 
video data and special reproduction control information. 
In detail, as will be described later, in the case where 
the image data file is used, the image data file is further 
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stored. The sound data, the text data, and the attached 
program are further stored in some cases. 
[0047] The contents storage unit 25 may be arranged 
at one location in a concentrated manner, or may be ar- 
ranged in a distributed manner. The point is that the con- 
tents can be accessed with the normal reproduction 
processing unit 22 and special reproduction processing 
unit 23. The video data, special reproduction control in- 
formation, image data files, sound data, text data, and 
attached program may be stored in separate media or 
may be stored in the same medium. As the medium, for 
example, DVD is used. These may be data which are 
transmitted via a network. 

[0048] The controller 21 basically receives an instruc- 
tion such as a normal reproduction and a special repro- 
duction with respect to the contents from the user via a 
user interface such as a GUI or the like. The controller 
21 controls for giving to the corresponding processing 
unit an instruction of reproduction by means of a method 
designated with respect to the designated contents. 
[0049] The normal reproduction processing unit 22 is 
used for the normal reproduction of the designated con- 
tents. 

[0050] The special reproduction processing unit 23 is 
used for the special reproduction (for example, a high 
speed reproduction, jump reproduction, trick reproduc- 
tion, or the like) of the designated contents by referring 
to the special reproduction control information. 
[0051] The display device 24 is used for displaying a 
video. 

[0052] The video reproduction apparatus can be real- 
ized by computer software. It may partially be realized 
by hardware (for example, decode board (MPEG-2 de- 
coder) or the like). The video reproduction apparatus 
may be realized as a dedicated device for video repro- 
duction. 

[0053] FIG. 7 shows one example of a reproduction 
processing procedure of the video reproduction appa- 
ratus of FIG. 6. At step S31 , it is determined whether 
user requests a normal reproduction or a special repro- 
duction. When a normal reproduction is requested, the 
designated video data is read at step S32 and a normal 
reproduction is conducted at step S33. When a special 
reproduction is requested from the user, the special re- 
production control information corresponding to the des- 
ignated video data is read at step S34, the location of 
the video data to be displayed is specified and the dis- 
play time is determined at step S35. The corresponding 
frame (group) is read from the video data (or the image 
data file) at step S36 to conduct special reproduction of 
the designated contents at step S37. The location of the 
video data can be specified and the display time can be 
determined at a timing different from that in FIG. 7. The 
procedure of the special reproduction of FIG. 7 may be 
consecutively conducted for each frame information, or 
each processing may be conducted in batches. Other 
procedures can be conducted. For example, in the case 
of the reproduction method in which the display time of 



each frame is equally set to a constant value, it is not 
necessary to determine the display time. 
[0054] Both in the normal reproduction and in the spe- 
cial reproduction, the user may demand various desig- 
5 nations (for example, the start point of the reproduction 
or the end point of the reproduction in the contents, a 
reproduction speed in the high speed reproduction, and 
reproduction time in the high speed reproduction, and 
other method, such as special reproduction or the like). 
[0055] Next, an algorithm for creating the frame infor- 
mation of the special reproduction control information 
and an algorithm for calculating the display time of the 
special reproduction will be schematically explained. 
[0056] At the time of creating the frame information, 
the frame information to be used at the time of the spe- 
cial reproduction is determined from the video data, the 
video location information is created, and the display 
time control information is created. 
[0057] The frame is determined by such methods as; 
1 ) a method for calculating the video frame on the basis 
of some characteristic quantity with respect to the video 
data (for example, a method for extracting the video 
frames such that the total of characteristic quantity (for 
example, the scene change quantity) between the ex- 
tracted frames becomes constant and a method for ex- 
tracting the video frames such that the total of impor- 
tance between the extracted frames becomes con- 
stant), and (2) a method for calculating the video frame 
on a fixed standard (for example, a method for extracting 
frames at random, and a method for extracting frames 
at an equal interval). The scene change quantity is also 
called as a frame activity value. 
[0058] In the creation of the display time control infor- 
mation 121 , there are available; (i) a method for calcu- 
lating an absolute value or a relative value of the display 
time or a display frame number, (ii) a method for calcu- 
lating reference information which is a base of the dis- 
play time and a display frame number (for example, the 
information designated by the user^characters in the 
video, sound synchronized with video, and persons in 
the video, and the importance obtained on the basis of 
the specific pattern in the video), (iii) a method for de- 
scribing both (i) and (ii). 

[0059] It is possible to appropriately combine (1) or 
(2) and (i), (ii) or (iii). Needless to say, other methods 
can be possible. One specific combination out of such 
methods can be used, and a plurality of combinations 
of these methods may be used and can be appropriately 
selected. 

[0060] In a specific case, at the same time with the 
determination of the frame at the method (1), a relative 
value of the display time and the number of display 
frames are determined. If this method is constantly 
used, it is possible to omit the display time control infor- 
mation processing unit 102. 

[0061] At the time of the special reproduction, it is as- 
sumed that the special reproduction is conducted by re- 
ferring to the display time control information 121 of (i), 
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(ii) or (iii) included in the frame information. However, 
the described value may be followed or the described 
value may be corrected and used. In addition to the de- 
scribed value and the corrected value thereof, inde- 
pendently created other information, and information in- 
put from the user may be used. Alternatively, only the 
independently created other information and the infor- 
mation input from the user may be used. A plurality of 
methods out of these methods are enabled and can be 
appropriately selected. 

[0062] Next, an outline of the special reproduction will 
be explained. 

[0063] A double speed reproduction (or a high speed 
reproduction) carries out reproduction in a time shorter 
than the time required for the normal reproduction of the 
original contents by reproducing a part of the frames out 
of the whole frames constituting the video data contents. 
For example, the frames indicated by the frame Infor- 
mation are displayed for each display time indicated by 
the display time control information 121 , in the order of 
time sequence. Based on a request from the user, such 
as a speed designation request for designating at what 
times speed of the normal reproduction the original con- 
tents are reproduced (in what factor of the time required 
for the normal reproduction the original contents are re- 
produced) and a time designation request for designat- 
ing how much time is taken for reproducing the contents, 
the display time of each frame (group) is determined to 
satisfy the reproduction request. The high speed repro- 
duction is called a summarized reproduction. 
[0064] A jump reproduction (or a jump continuous re- 
production) is such that a part of the frame shown in the 
frame information is subjected to non-reproduction, for 
example, on the basis of the reproduction/non-repro- 
duction information described later in the high speed re- 
production. The high speed reproduction is conducted 
with respect to the frame excluding the frame which is 
subjected to non-reproduction out of the frames shown 
in shown in the frame information. 
[0065] A trick reproduction excludes from the repro- 
duction except for the normal reproduction the high 
speed reproduction and the jump reproduction. For ex- 
ample, at the time of reproducing the frame shown in 
the frame information, there can be considered various 
forms such as a substituted reproduction for reproduc- 
ing a certain portion by replacing the order of time se- 
quence, an overlapped reproduction for reproducing a 
certain portion repeatedly a plurality of times at the time 
of reproducing the frame shown in frame information, a 
variable speed reproduction in which at the time of re- 
producing the frame shown in the frame information, a 
certain portion is reproduced at a speed lower than the 
reproduction of another portion (including the case in 
which the portion is reproduced at the speed of normal 
reproduction, or the case in which the portion is repro- 
duced at a speed lower than the normal reproduction 
time) or at a speed higher than another portion, or the 
reproduction of a certain portion is temporarily suspend- 



ed, or such forms of reproduction are appropriately com- 
bined, a random reproduction for reproducing at a ran- 
dom time sequence for each of a constant set of frames 
shown in the frame information. 

5 [0066] Needless to say, it is possible to appropriately 
combine a plurality of kinds of methods. For example, 
at the time of the double speed, the important portion is 
reproduced a plurality of times, and various variations 
are considered such as a method for setting a reproduc- 

10 tion speed to a normal reproduction speed. 

[0067] Hereinafter, embodiments of the present in- 
vention will be specifically explained in detail. 
[0068] In the beginning, the embodiments will be ex- 
plained by taking as an example a case in which a re- 

15 production frame is determined on the basis of the 
scene change quantity between adjacent frames as the 
characteristic quantity of the video data. 
[0069] Here, there will be explained a case in which 
one frame is corresponded to one frame information. 

20 [0070] FIG. 8 shows one example of a data structure 
of the special reproduction control information created 
under the target video data. 

[0071 ] The data structure is such that the display time 
information 121 is described which is information show- 

25 jng an absolute or a relative display time as display time 
control information 102 in FIG. 1 (or instead of the dis- 
play time control information 102). A structure describ- 
ing the importance in addition to the display time control 
information 1 02 will be described later. 

30 [0072] The video location information 101 is informa- 
tion which enables the specification of the location in the 
original video frame of the video, and any of a frame 
number (for example, a sequence number from the first 
frame) or a number which specifies one frame in a 

35 stream like a time stamp may be used. If the video data 
corresponding to the frame extracted from the original 
video stream is set as a separate frame, a URL or the 
like may be used as information for specifying the file 
location. 

40 [0073] The display time information 1 21 is information 
which specifies the time for displaying the video or the 
number of frames. It is possible to describe actual time 
or the number of frames as a unit and a relative value 
(for example, a normalized numeric value) which clari- 

45 ties a relationship of the relative time length with the dis- 
play time information described in other frame informa- 
tion. In the latter case, the actual reproduction time of 
each video is calculated from the total reproduction time 
as a whole. With respect to each video, the continuation 

so time of the display is not described, but such description 
with a combination of a start time starting from a specific 
timing (for example, the start time of the first video is set 
to 0), and the end time and a description with a combi- 
nation of the start time and the continuation time may 

55 be used. 

[0074] In the special reproduction, basically the repro- 
duction of the video present at a location specified with 
the video location information 101 only for the display 
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time specified with the display time information 121 is 
consecutively conducted only for the number of the 
items of frame information T included in the arrange- 
ment, such as shown in FIG. 8. 
[0075] If the start time and the end time or the contin- 
uation time are specified and this designation is fol- 
lowed, the video present at the location specified with 
the video location information 101 is consecutively re- 
produced from the start time specified with the display 
time information 121 up to the end time or during the 
continuation time only for the number of items of the 
frame information "i" included in the arrangement. 
[0076] The described display time can be processed 
and reproduced by using parameters such as reproduc- 
tion rata information and additional information. 
[0077] Next, a method for describing the video loca- 
tion information will be explained by using FIGS. 9 
through 11. 

[0078] FIG. 9 explains a method for describing the vid- 
eo location information referring to the original video 
frame. 

[0079] In FIG. 9, a time axis 200 corresponds to the 
original video stream based on which the frame infor- 
mation for the special reproduction is created and a vid- 
eo 201 corresponds to one frame which becomes a de- 
scription target in the video stream. A time axis 202 cor- 
responds to reproduction time of a video at the time of 
the special reproduction by using the video 201 extract- 
ed from the original video stream. A display time 203 is 
a section corresponding to one video 201 included in 
the display time 203. For example, the video location 
information 101 showing the location of the video 201 
and the video display time 121 showing the length of the 
display time 203 are described as frame information. As 
described above, the description on the location of the 
video 201 may be given in any form such as a frame 
number, a time stamp or the like as long as one frame 
in the original video stream can be specif ied. This frame 
information will be described in the same manner with 
respect to the other videos 201 . 

[0080] FIG. 10 explains a method for describing the 
video location information referring to the image data 
file. 

[0081] The method for describing the video location 
information shown in FIG. 9 directly refers to the frame 
in the original data frame which Is to be subjected to the 
special reproduction. The method for describing the vid- 
eo location information shown in FIG. 10 is a method in 
which an image data file 300 corresponding to a single 
frame 302 extracted from the original video stream is 
created in a separate file, and the location thereof is de- 
scribed. A method for describing the file location can be 
handled in the same manner by using, for example, the 
URL or the like both in the case where the file is present 
on a local storage device and in the case where the file 
is present on the network. A set of the video location 
information 1 01 showing the location of this image data 
file and the video display time 121 showing the length 



of the corresponding display time 301 is described as 
frame information. 

[0082] If a correspondence to the original video frame 
is required, the information (similar to the video location 
5 information in the case of, for example, FIG. 9) showing 
a single frame 302 of the original video corresponding 
to the described frame information may be included in 
the frame information. The frame information may com- 
prise the video location information, the display time in- 
w formation and the original video information. When the 
original video information is not required, it is not re- 
quired to describe the original video. 
[0083] The configuration of the video data described 
with the method of FIG. 10 is not particularly restricted. 
*5 For example, the frame of the original video may be 
used as it is or may be reduced. This is effective for con- 
ducting a reproduction processing at a high speed be- 
cause it is not required to develop the original video. 
[0084] If the original video stream is compressed by 
20 means of MPEG-1 or MPEG-2 or the like, a reduced vid- 
eo can be created .at a high speed only by partially de- 
coding the streams. In this method, only the DCT (the 
discrete cosine conversion) coefficients of an I picture 
frame encoded within the frame (an inner-frame encod- 
es ed frame) is decoded and a reduced video is created by 
using the DCT coefficients. 

[0085] In the description method of FIG. 1 0, the image 
data files are stored in separate files. However, these 
files may be stored in a package in a video data group 

30 storage file having a video format (for example, a motion 
JPEG) which can be accessed at random. The location 
of the video data is specified by a combination of the 
URL showing the location of the image data file, a frame 
number or a time stamp showing the location in the im- 

35 age data file. The U RL information showing the location 
of the image data file may be described in each frame 
information or may be described as additional informa- 
tion outside of the arrangement of thef rame information. 
[0086] Various methods can be taken, to select the 

40 frame of the original video or the like and create the vid- 
eo data to describe the video location information. For 
example, the video data may be extracted at an equal 
interval from the original video. Where the motion of the 
screen quite often appears, the video data is selected 

45 in a narrow interval. Where the motion of the screen 
quite rarely appears, the video frame is selected in a 
wide interval. 

[0087] Here, referring to FIG. 11, there will be ex- 
plained a method in which as one example of a method 

50 for selecting frames, the frame is selected in a narrow 
interval where the motion of the screen quite often ap- 
pears while the frame is selected in a wide interval 
where the motion of the screen rarely appears. 
[0088] In FIG. 1 1 , a horizontal axis represents the se- 

55 lected frame number, and a curve 800 represents a 
change in the scene change quantity (between adjacent 
frames). A method for calculating the scene change 
quantity is the same as a method at the time of calcu- 
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lating the display time described later. Here, in order to 
determine an extraction interval in accordance with the 
motion of the scene, there is shown a method for calcu- 
lating an interval at which the scene change quantity be- 
tween video frames from which the video data is extract- 
ed becomes constant. The total of the scene change 
quantity between video frames from which the video da- 
ta is extracted is set to S v and the total of the scene 
change quantity in the whole frame is set to S (= ESj) 
while the number of data items to be extracted is n. In 
order to set the video change quantity between video 
frames from which video data is extracted to a constant 
level, Sj = S/n may be provided. In FIG. 11 , the area S f 
of the scene change quantity curve 800 divided with the 
broken lines becomes constant. Then, for example, the 
scene change quantity is accumulated from the extract- 
ed frame, so that the video frame having the value ex- 
ceeding the S/n is set as the frame F s from which the 
video data is extracted. 

[0089] If the video data is created by I picture frame 
of M PEG, the video frame from which the calculated vid- 
eo data is created is not necessarily the I picture, the 
video data is created from the I picture frame in the vi- 
cinity thereof. 

[0090] By the way, in the method explained in FIG. 11, 
the video frame which belongs to the section of the 
scene change quantity = 0 is skipped. However, if a still 
picture continues, the scene is important in many cases. 
Then, if the scene change quantity = 0 continues for 
more than a constant time, the frame at that time may 
be extracted. For example, the scene change quantity 
may be accumulated from the extracted frame so that 
the frame having the value exceeding S/n or the frame 
at which the scene change quantity = 0 continues for 
more than a constant time may be set as a frame Fj from 
which the video data is extracted. The accumulated val- 
ue of the scene change quantity may be or may not be 
cleared to 0. It is possible to selectively clear the accu- 
mulated value based on a request from the user. 
[0091] In the case of an example of FIG. 11 , it is as- 
sumed that the display time information 1 21 is described 
so that the display time becomes the same with respect 
to any of the frames. When the video is reproduced in 
accordance with this display time information 121, the 
scene change quantity becomes constant. The display 
time information 121 may be determined and described 
in a separate method. 

[0092] Next, there will be explained a case in which 
one or a plurality of frames are allowed to correspond 
to one frame information. 

[0093] One example of the data structure of the spe- 
cial reproduction information in this case is the same as 
that in FIG. 8. 

[0094] Hereinafter, a method for describing the video 
location information will be explained by using FIGS. 12 
through 14. 

[0095] FIG. 12 explains a method for describing the 
video location information for referring to the continuous 



• frames of the original video. 

[0096] A method for describing the video location in- 
formation shown in FIG. 9 refers to one frame 201 in 
one original video for conducting the special reproduc- 

5 tion. However, the method for describing the video lo- 
cation information shown in FIG. 12 describes a set 500 
of a plurality of continuous frames in the original video. 
The set 500 of frames may include some frames extract- 
ed from the plural continuous frames within the original 

10 video. The set 500 of frames may include only one 
frame. 

[0097] If the set 500 of frames includes a plurality of 
continuous frames or one frame in the original video, the 
location of the start frame and the location of the end 

is frame are described, or the location of the start frame 
and the continuation time of the set 500 are described 
in the description of the frame location (If one frame is 
included, for example, the start frame is set equal to the 
end frame). In the description of the location and the 

20 time, the frame number and the time stamp and the like 
are used which caji specify frames in the streams. 
[0098] If the set 500 of frames is a part out of a plurality 
of continuous frames in the original video, information 
is described which enables the specification of the 

25 frames. If the method for extracting the frames is deter- 
mined, and the specification of the frames can be spec- 
ified with the description of the locations of the start 
frame and the end frame, the start frame or the end 
frame may be described. 

30 [0099] The display time information 501 shows the to- 
tal display time corresponding to the whole frame group 
included in the corresponding frame set 500. The dis- 
play time of each frame included in the set 500 of frames 
can be appropriately determined on the side of device 

35 for the special reproduction. As a simple method, there 
is available a method in which the above total display 
time is equally divided with the total number of frames 
in the set 500 to provide one frame display time. Various 
other methods are available. 

40 [01 00] FIG . 1 3 explains a method for describing video 
location information for referring to a set of the image 
data files. 

[0101] The method for describing the video location 
information shown in FIG. 12 directly refers to continu- 

45 ous frames in the original video to be reproduced. A 
method for describing the video location information 
shown in FIG. 13 creates a set 600 of the image data 
files corresponding to the original video frame set 602 
extracted from the original video stream in a separate 

so file and describes the location thereof. In the method for 
describing the file location, the file can be handled in the 
same manner by using, for example, URL or the like, 
even if the file is present on a local storage device or if 
the file is present on a network. A set of the video loca- 

55 tion information 101 showing the location of this image 
data file and the video display time 121 showing a length 
of the corresponding display time 601 can be described 
as the frame information. 
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[0102] If a correspondence with the original frame is 
required, information showing the frame set 602 of the 
original video corresponding to the described frame in- 
formation (for example, information similar to the video 
location information in the case of FIG. 12) may be in- 
cluded in the frame information. The frame information 
may comprise the video location information, the display 
time information and the original video information. The 
original video information is not required to be described 
when the information is not required. 
[0103] The configuration of the video data, the prep- 
aration of the video data, the preparation of the reduced 
video, the method for storing the video data and the 
method for describing the location information such as 
the URL or the like are the same as what has been de- 
scribed above. 

[0104] Various methods can be adopted in the same 
manner as described above as to which frame of the 
original video is selected to create the video data to be 
described in the video location information. For exam- 
ple, the video data may be extracted at an equal interval 
from the original video. Where a motion of the screen 
quite often appears, a frame is extracted in a narrow in- 
terval. Where the motion of the screen rarely appears, 
a frame is extracted in a wide interval. 
[01 05] In the above embodiments, the image data file 
300 is corresponded to the original video 302 in a frame 
to frame manner. It is possible to make the location in- 
formation of the frame described as the original video 
information have a time width. 

[01 06] FIG. 1 4 shows an example in which the original 
video information is allowed to have a time width with 
respect to the FIG. 8. An original video information 3701 
is added to the frame information structure shown in 
FIG. 8. The original video information 3701 comprises 
a start point information 3702 and a section length in- 
formation 3703 which are the start point and the section 
length of the original video which is a target of the special 
reproduction. The original video information 3701 com- 
prises any information which can specify the section of 
the original video having the time width . It may comprise 
the start point information and an end point information 
in stead of the start point information and the length in- 
formation. 

[0107] FIG. 15 shows an example in which the original 
video Information is allowed to have a time width with 
respect to the FIG. 9. In this case, for example, as video 
location information, display time information and origi- 
nal video information included in the same frame infor- 
mation, the location of the original video frame 3801 , the 
display time 3802, and the original video frame section 
3803 which comprises the start point (frame location) 
and the section length are described to show that these 
correspond to each other. That is, as a video represent- 
ative of the original video frame section 3803, the orig- 
inal video frame location 3801 described in the video 
location information is displayed. 
[01 08] FIG. 1 6 shows an example in which the original 



information is allowed to have a time width with respect 
to the FIG. 1 0. In this case, for example, as video loca- 
tion information, display time information and original 
video information included in the same frame informa- 

5 tion, the location of the image data file 3901 for the dis- 
play, the display time 3902, and the original video frame 
section 3903 which comprises the start point (frame lo- 
cation) and the section length are described to show that 
these correspond to each other. 

10 [0109] That is, as avideo representative of the original 
video frame section 3903, the image 3901 in the image 
data file described in the video location information is 
displayed. 

[0110] Furthermore, as shown in FIGS. 12 and 13, if 
15 a set of frames is used as a video for the display, a sec- 
tion different from the original video frame section for 
displaying the video may be allowed to correspond to 
the original video information. 

[0111] FIG. 17 shows an example in which the original 
20 video information is allowed to have a time width with 
respecttothe FIG.J2. In this case, for example, as video 
location information, display time information and origi- 
nal video information included in the same frame infor- 
mation, a set 4001 of frames in the original video, the 
25 display time 4002, and the original video frame section 
4003 which comprises the start point (frame location) 
and the section length are described to show that these 
correspond to each other. 

[0112] At this time, the section 4001 of a set of frames 

30 which are described as video location information, and 
the original video frame section 4003 which is described 
as the original video information are not necessarily re- 
quired to coincide with each other and a different section 
may be used for display. 

35 [0113] FIG. 18 shows an example in which the original 
video information is allowed to have a time width with 
respecttothe FIG. 13. In this case, for example, as video 
location information, display time information and origi- 
nal video information included in the^same frame infor- 

40 mation , a set 41 01 of frames in the video file, the display 
time 4102, and the original video frame section 4103 
which comprises the start point (frame location) and the 
section length are described to show that these corre- 
spond to each other. 

45 [0114] Atthlstime, the section of a set 4101 offrames 
described as video location Information, and the original 
video frame section 41 03 described as the original video 
are not necessarily required to coincide with each other. 
That is, the section of the set 41 01 of the frames for the 

50 display may be shorter or longer than the original video 
frame section 4103. Furthermore, a video having com- 
pletely different contents may be included therein. In ad- 
dition, only particularly important section may be ex- 
tracted from the section described in the original video 

55 location as the image data file so that collected video 
data is used. 

[0115] At the time of displaying the videos based on, 
for example, the summarized reproduction (special re- 
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production) using these items of the frame information, 
it may be desired that the corresponding frame in the 
original video is referred to. 

[01 1 6] FIG. 1 9 shows a flow for starting the reproduc- 
tion from the frame of the original video corresponding 5 
to the video frame displayed in special reproduction. At 
step S3601 , the reproduction start frame is specified in 
the special reproduction. At step S3602, the original vid- 
eo frame corresponding to the specified frame is calcu- 
lated with a method described later. At step S3603, the 
original video is reproduced from the calculated frames. 
[0117] This flow can be used for referring to the cor- 
responding location of the original video in addition to 
special reproduction. 

[0118] At step S3602, as one example of a method 
for calculating the corresponding original video frame, 
there Is shown a method for using the proportional dis- 
tribution with respect to display time of the specified 
frame. The display time information included in the i-th 
frame information is set to D ( sec, the section start loca- 
tion of the original video information is set to t| sec, and 
the section length is set to dj sec. If the location is spec- 
ified at which t sec has passed from the start of the re- 
production using the i-th frame information, the frame 
location of the corresponding original video is T = t ( d ( 
x t/D t . 

[01 1 9] Referring to FIGS. 20 and 21 , as examples of 
a method for selecting a frame, there will be explained 
a method for extracting the frame in a narrow interval 
where the motion of the screen quite often appears while 
extracting the frame in a wide interval where the motion 
of the screen rarely appears in accordance with the mo- 
tion of the screen. The horizontal axis, the curve 800, 
and Sj and F t are the same as those in FIG. 11 . 
[0120] In the example of FIG. 11 , the video data is ex- 
tracted one frame after another at an interval at which 
the scene change quantity between the frames from 
which the video data is extracted is made constant. 
FIGS. 20 and 21 show examples in which a set of a plu- 
rality of frames are extracted based on the frame F| as 
reference. For example, as shown in FIG. 20, the same 
number of continuous frames may be extracted from F,. 
The frame length 81 1 and the frame length 812 equal to 
each other. As shown in FIG. 21, the corresponding 
number of continuous frames may be extracted so that 
the total of the scene change quantity from F ( becomes 
constant. The area 813 and the area 814 equal to each 
other. Various other methods can be considered. 
[01 21 ] It is possible to use the frame selection method 
in which the frame is extracted when the scene change 
quantity = 0 continues for more than a constant time. 
[0122] As in the case of FIG. 1 1 , the display time in- 
formation 121 may be described so that the same dis- 
play time may be provided with respect to any of frame 
sets in the cases of FIGS. 20 and 21 . Alternatively, the 
display time information may be determined and de- 
scribed in a different method. 

[0123] Next, one example of a processing for calcu- 



lating the display time will be explained. 
[0124] FIG. 22 shows one example of a procedure of 
the basic processing for calculating the display time so 
that the scene change quantity becomes constant as 
much as possible when the video described in the video 
location information is continuously reproduced in ac- 
cordance with time described in the display time infor- 
mation. 

[0125] This processing can be applied to a case in 
which the frames are extracted in any method. For ex- 
ample, if the frames are extracted in a method shown in 
FIG. 11, the processing can be omitted. Since the 
processing shown in FIG. 11 selects the frames such 
that the scene change quantity becomes constant when 
the frames are displayed for a fixed time period. 
[0126] At step S71, the scene change quantity be- 
tween adjacent frames is calculated with respect to all 
frames of the original video. If each frame of the video 
is represented in bit map, the differential value of the 
pixel between adjacent frames can be set to the scene 
change quantity. If^he video is compressed with MPEG, 
the scene change quantity can be calculated by using 
a motion vector. 

[0127] One example of a method for calculating the 
scene change quantity will be explained. 
[0128] FIG. 23 shows one example of a basic 
processing procedure for calculating a scene change 
quantity of all frames from the video streams com- 
pressed with MPEG. 

[0129] At step S81 , a motion vector is extracted from 
the P picture frame. The video frame compressed with 
the MPEG is described with an arrangement of I picture 
(an inner-frame encoded frame), P picture (an inter- 
frame encoded frame in a forward prediction), and B pic- 
ture (an inter-frame encoded frame in a backward pre- 
diction), as shown in FIG. 24. The P picture includes a 
motion vector corresponding to a motion from the pre- 
ceding I picture or P picture. 

[0130] At step S82, the magnitude (intensity) of the 
each motion vector included in the frame of one P pic- 
ture is calculated, and an average thereof is set as a 
scene change quantity from the preceding I picture or P 
picture. 

[0131] At step S83, on the basis of the scene change 
quantity calculated with respect to the P picture, the 
scene change quantity is calculated for each one frame 
corresponding to the frame other than the P picture. For 
example, if the average value of the motion vector of the 
P picture frame is p, and the interval from the preceding 
I picture or P picture from which the video is referred to 
is d, the scene change quantity per one frame of each 
frame is set to p/d. 

[0132] Subsequently, at step S72 in the procedure of 
FIG. 22, the total of the scene change quantity of frames 
between the following description target frames is cal- 
culated from the description target frame described in 
the video location information. 

[0133] FIG. 25 describes a change in the scene 
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change quantity for each one frame. The horizontal axis 
corresponds to the frame number while a curve 1000 
denotes a change in the scene change quantity. If the 
display time of the video having the location information 
of the frame information Fj is calculated, the scene 5 
change quantity in the section 1001 up to F j+1 is added 
which corresponds to the frame location of the next de- 
scription target frame. It is considered that this becomes 
an area S t of the hatching portion 1 002, which is a mag- 
nitude of a motion of the frame location F h 
[0134] Subsequently, at step S73 in the procedure of 
FIG. 22, the display time of each frame is calculated. In 
order to set the scene change quantity to a constant lev- 
el as much as possible, a larger quantity of the display 
time may only be allocated to the frame where the mo- 
tion of the screen is large, so that the ratio of the display 
time allocated to the video of each frame location F ( to 
the reproduction time may be set to S/XSj. When the 
total of the reproduction time is set to T, the display time 
of each video will be set to Dj = T x S/1S V The value of 
the total T of the reproduction time is defined as the total 
reproduction time of the original video. 
[01 35] If no scene change appears and Sj = 0 , the low- 
er limit value (for example, 1) which is calculated in ad- 
vance may be entered, or the frame information thereof 
may not be described. Even with respect to the frame 
where the screen change is very small even if Sj = 0 is 
not provided and virtually no change is displayed on the 
actual reproduction, the lower limit value may be substi- 
tuted and no frame information may be described. If no 
frame information is described, the value of Sj may be 
added to S^ 4 . 1 or may not be added thereto. 
[0136] The processing for calculating this display time 
can be conducted for the preparation of the frame infor- 
mation with the special reproduction control information 
creating apparatus, but the processing can be conduct- 
ed at the time of the special reproduction on the side of 
the video reproduction apparatus. 
[0137] Next, there will be explained a case in which 
the special reproduction is conducted. 
[0138] FIG. 26 shows one example for the N times 
high-speed reproduction on the basis of the special re- 
production control information that has been described. 
[0139] At step S111 , the display time D'j at the time of 
reproduction is calculated on the basis of the reproduc- 
tion rate information. The display time information de- 
scribed in the frame information is standard display time, 
the display time D\ = D/N of each frame is calculated 
when reproduction at N times high-speed is conducted. 
[0140] At step S112, initialization for the display is 
conducted, and i = 0 is set so that the first frame infor- 
mation is displayed. 

[0141] At steps 11 3, it is determined whether the dis- 
play time D'| of the i-th frame information is larger than 
the threshold value of the preset display time. 
[0142] If the display time is larger, the video location 
information included in the i-th frame information F, is 
displayed for D', seconds at step S114. 



[01 43] If the display time is not larger, the process pro- 
ceeds to step S115to search the i-th frame information 
which is not smallerthan the threshold value in a forward 
direction. During search, the display time of the frame 
information which is smaller than the threshold value of 
the display time is all added to the display time of the i- 
th frame information. The display time of the frame in- 
formation which is smaller than the threshold value of 
the display time is set to 0. The reason why such 
processing is conducted is that the time for preparing 
the video to be displayed becomes longer than the dis- 
play time when the display time at the time of reproduc- 
tion becomes very short with the result that the display 
cannot be conducted in time. Then, if the display time 
becomes very short, the process proceeds to the next 
step without displaying the video. At that time, this dis- 
play time of the video which is not displayed is added to 
the display time of the video to be displayed so that the 
total display time becomes unchanged. 
[0144] At step S116, it is determined whether "i" is 
smaller than the tojal number of the frame information 
items in order to determine whether or not the frame in- 
formation which is not displayed remains. If "i" is lower 
than the total number of the frame information items, the 
process proceeds to step S117 to increment "i" by one 
to create for the display of the next frame information. 
When "i" reaches the total number of the frame informa- 
tion items, the reproduction processing is completed. 
[0145] FIG. 27 shows one example for conducting the 
N times high-speed reproduction on the basis of the de- 
scribed special reproduction control information by tak- 
ing the display cycle as a reference. 
[01 46] At step S1 21 , the display time D'j of each frame 
is calculated as D'j = D/ N at the N times high-speed 
reproduction. Here, the calculated display time is actu- 
ally associated with the display cycle so that the video 
cannot be always displayed in a calculated time. 
[0147] FIG. 28 shows a relationship between the cal- 
culated display time and the display cycle, The time axis 
1300 shows the calculated display time while the time 
axis 1 301 shows the display cycle based on the display 
rate. If the display rate is f frame/sec, an interval of the 
display cycle becomes 1/f sec, 

[0148] Consequently, at step S122, the frame infor- 
mation Fj including the start point of the display cycle is 
searched while the video included in the frame informa- 
tion F, is displayed for one display cycle (1/f sec) at step 
S123. 

[0149] For example, the display cycle 1302 (FIG. 28) 
displays the video of the frame information correspond- 
ing to this display time because the display start point 
1303 is included in the calculated display time 1304. 
[0150] A method for allowing the display cycle corre- 
spond to the frame information may display the video at 
the nearest location of the start point of the display cycle, 
as shown in FIG. 29. If the display time becomes smaller 
than the display cycle like the display time 1305 of FIG. 
28, the display of the video may be omitted. If the video 
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is forcibly displayed, the display time before and after 
the video is shortened to adjust so that the total display 
time becomes unchanged. 

[01 51 ] At step S1 24, it is determined whether the cur- 
rent display is the final display or not. If the current dis- 5 
play is the final display, the processing is completed. If 
the display is not the final display, the process proceeds 
to step S125 to conduct the processing of the next dis- 
play cycle. 

[0152] FIG. 30 shows another example of a data 
structure for describing the frame information. The 
frame information included in the data structure of FIG. 
8 or FIG. 14 summarizes a single original video. A plu- 
rality of original videos can be summarized by expand- 
ing the frame information. FIG. 30 shows such an ex- 
ample. An original video location information 4202 for 
indicating the original video file location is added to the 
original video information 4201 included in the individual 
frame information. The file described in the original vid- 
eo location information 4202 is not necessarily required 
to handle the entire file. The file can be used in the form 
in which only a portion of the section is extracted. In this 
case, not only file information such as a file name or the 
like but also the section information showing which sec- 
tion of the file becomes an object are additionally de- 
scribed. Plural sections may be selected from the orig- 
inal video. 

[0153] Furthermore, if several kinds of the original vid- 
eos are present and identification information is individ- 
ually addedto the videos, theoriginal video identification 
information may be described in place of the original vid- 
eo location information. 

[0154] FIG. 31 explains an example in which a plural- 
ity of original videos are summarized and displayed by 
using the frame information added with the original vid- 
eo location information. In this example, three videos 
are summarized to display one summarized video. With 
respect to the video 2, in place of the whole section, two 
sections 4301 and 4302 are taken out to handle the re- 
spective videos. As the frame information, together with 
these original video information, the frame location 
(4303 with respect to 4301 ) of respective representative 
video is described as the video location information 
while the display time (4304 with respect to 4301 ) is de- 
scribed as the display time Information. 
[0155] FIG. 32 explains another example In which a 
plurality of original videos are summarized and dis- 
played by using the frame information added with the 
original video location information. In this example, 
three videos are summarized to display one summa- 
rized video. With respect to the video 2, in place of the 
whole section, a portion of the section is taken out. A 
plurality of sections may be taken out as described in 
FIG. 31. As the frame information, together with these 
items of the original video information (for example, the 
section information 4401 in addition to the video 2), the 
storage location of respective representative video files 
4402 is described as the video location information and 



the display time 4403 is described as display time infor- 
mation. 

[0156] Addition of the original video location informa- 
tion to the frame information which has been explained 
in these examples can be applied completely in the 
same way to the case in which a set of frames is used 
as video location information with the result that a plu- 
rality of original videos are summarized and displayed. 
[0157] FIG. 33 shows another data structure for de- 
scribing the frame information. In this data structure, in 
addition to the video location information 101 , the dis- 
play time information 121 and the original video infor- 
mation 3701 which has been already explained, a mo- 
tion information 4501 and interest region information 
4502 are added. The motion information 4501 describes 
a magnitude of a motion (a scene change quantity) in a 
section (the section described in the original video infor- 
mation) of the original video corresponding to the frame 
information. The interest region information 4502 refers 
to a description of the information which should be par- 
ticularly interested jp the video which is described in the 
video location information. 

[01 58] The motion information can be used for calcu- 
lating the display time of the video described in the video 
location information as used at the time of calculating 
the display time from the motion of the video, as shown 
in FIG. 22. In this case, even when the display time in- 
formation is omitted and only the motion information is 
described, special reproduction such as high-speed re- 
production can be conducted in the same manner as in 
the case in which the display time is described. In this 
case, the display time is calculated at the time of repro- 
duction. 

[0159] Both the display time information and the mo- 
tion information can be described at the same time, in 
that case, an application for displaying uses the required 
one of the two, or uses both in combination in accord- 
ance with the processing. 

[0160] For example, the display time calculated irre- 
spective of the motion is described in the display time 
information. A method for calculating the display time 
for cutting out important scenes from the original video 
corresponds to this. At the time of the high-speed repro- 
duction of the summarized contents calculated in this 
manner, the motion information is used so that a portion 
with a large motion is reproduced slowly while a portion 
witha small motion is reproduced quickly with the result 
that a high-speed reproduction free from a large over- 
look is enabled. 

[0161] The interest region information is used when 
the particularly interest region is present in the video de- 
scribed in the video location information of the frame in- 
formation. For example, faces of persons who seem to 
be important correspond to this. At the time of displaying 
the video including such interest region information, the 
display may be conducted by overlapping a square 
frame so that the interest region can be easily detected. 
The frame display is not indispensable, and the video 
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may only be displayed as it is. 

[0162] The interest region information can be used for 
processing and displaying the special reproduction con- 
trol information such as frame information or the like. 
For example, if a part of the frame information is repro- 5 
duced and displayed, the frame information including 
the interest region information is displayed with priority. 
Further, it is assumed that the frame information includ- 
ing square area with large area has higher importance, 
thereby making it possible to selectively displaying he 
video. 

[0163] As shown above, there has been explained an 
example in which the processing is conducted on the 
basis of the scene change quantity. Hereinafter, there 
will be explained a case in which the importance infor- 
mation is used. 

[0164] FIG. 34 is a view showing examples of a data 
structure of the frame information attached to the video. 
[0165] An importance information 122 is described in 
addition to or in place of the display time control infor- 
mation 1 02 in the data structure of the frame information 
of FIG. 1. The display time is calculated based on the 
importance information 122. 

[0166] The importance information 122 represents 
the importance of the corresponding frame (or a set of 
frames). The importance is represented, for example, 
as an integer in a constant range (for example, 0 to 1 00), 
or is represented as an actual number in a constant 
range (for example, 0 to 1). Otherwise, the importance 
information 1 22 may be represented as an integer or an 
actual number value without setting the upper limit. The 
importance information 122 may be attached to all the 
frames of the video, or only the frame in which the im- 
portance is changed. 

[0167] In this case as well, it is possible to take any 
form of FIGS. 9, 10, 12, and 13. The frame extraction 
method of FIGS. 11, 20, and 21 can be used. In this 
case, the scene change quantity of FIGS. 11, 20, and 
21 may be replaced by the importance. 
[01 68] Next, in the example which has been explained 
above, the display time is set with the scene change 
quantity. However, the display time may be set by the 
importance information. Hereinafter, the method for set- 
ting the display time will be explained. 
[0169] In the setting the display time on the basis of 
the scene change quantity exemplified above in order 
to understand the video consents well, the display time 
is set long where the change quantity is large and the 
display time is set short where the change quantity is 
small. In the setting of the display time on the basis of 
this importance, the display time is set long where the 
importance is high and the display time is set short 
where the importance is low. That is, since the method 
for setting the display time according to the importance 
is basically similar to the method for setting the display 
time based on the scene change quantity, the method 
will be briefly explained. 

[0170] FIG. 36 shows one example of the basic 



processing procedure in this case. 
[0171] At step S191 , the importance of all frames of 
the original video will be calculated. A concrete method 
thereof will be exemplified later. 
[0172] At step S192, the total of the importance from 
the description object frame described in the video lo- 
cation information to the next description object frame 
will be calculated. 

[0173] FIG. 37 describes the change in the impor- 
tance for each one frame. Reference numeral 2200 de- 
notes the importance. If the display time of the video 
having the location information of the frame information 
Fj is calculated, the importance in the section up to F i+1 
which is the next description object frame location is ac- 
cumulated. The accumulation result is an area S'j of the 
hatching portion 2202. 

[0174] At step S1 93, the display time of each frame is 
calculated. Suppose that the ratio of the display time al- 
located to the video at each frame location F f the repro- 
duction time is set to SyxS'j. When the total of the re- 
production time is set to T, the display time of each video 
becomes Dj = Tx S'/S'j. The value of the total T of the 
reproduction time is a standard reproduction time to be 
regulated as the total reproduction time of the original 
video. 

[0175] When the total of the importance becomes S\ 
= 0, the preset lower limit value (for example, 1) may be 
described, or the frame information may not be de- 
scribed. Even if S', = 0 is not established but the impor- 
tance is very small, and it is assumed that such a frame 
is virtually not displayed, the lower limit value may be 
described or the frame information may not be de- 
scribed. If the frame information is not described, the S'j 
value may be added and may not be added to S*^. 
[0176] As shown in FIG. 34, in the data structure of 
the frame information of FIG. 1 , the video location infor- 
mation 1 01 , the display time information 1 21 and the im- 
portance information 112 may be described in each 
frame information V. At the time oMhe special repro- 
duction, the display time information 121 is used but the 
importance information 122 is not used; the importance 
information 122 is used but the display time information 
121 is not used; both the importance information 122 
and the display time information 121 are used; and nei- 
ther the importance information 122 nor the display time 
information 121 Is used. 

[0177] The processing of calculating the display time 
can be conducted for preparing the frame information 
with the special reproduction control information creat- 
ing apparatus. However, the processing may be con- 
ducted on the side of the video reproduction apparatus 
at the time of the special reproduction. 
[0178] Next, a method (for example, step S1 91 of FIG. 
36) for calculating the importance of each frame or the 
scene (video frame section) will be explained. 
[0179] Since various factors are normally intertwined 
in the judgment as to a certain scene having a video is 
important, the most appropriate method for calculating 
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the importance is a method in which man determines 
the importance. In this method, importance evaluator 
evaluates the importance for each scene of the video, 
or for each of the constant interval, so that the impor- 
tance is input as the importance data. The importance 5 
data referred to here refer to a frame number or time 
and a correspondence table with the importance value. 
In order to avoid subjective evaluation of importance, a 
plurality of importance evaluators are allowed to evalu- 
ate the same video to calculate the average value (or a 10 
median or the like will do) for each scene or each video 
frame section so that the importance is finally deter- 
mined. In such manual input of the importance data, it 
is possible to add vague expressions and a plurality of 
elements which cannot be expressed in words to the im- 1 $ 
portance. 

[01 80] In order to omit the trouble of determination by 
man, it Is preferable that a phenomenon is expected in 
which a video scene which seems to be important is like- 
ly to appear, and the processing is used which automat- 
ically evaluates such phenomenon to convert the phe- 
nomenon into importance. Here, some examples are 
shown in which importance is automatically created. 
[01 81] FIG. 38 shows an example of a processing pro- 
cedure at the time of automatically calculating important 
data on the basis of the idea that a scene having a large 
sound level is important. FIG. 38 is established as a 
function block diagram. 

[0182] In the sound level calculation processing at 
step S21 0, the sound level at each time is calculated out 
when the sound level attached to the video is calculated. 
Since the sound level largely changes in an instant, the 
smoothing processing or the like may be conducted in 
the sound level calculation processing at step S21 0. 
[0183] In the importance calculation processing at 
step S211 , a processing is conducted for converting into 
the importance the sound level output as a result of the 
sound level calculation processing. For example, the 
sound level input is linearly converted into a value of 0 
to 100, the sound level having the lowest sound level 
set in advance being set to 0, and having the highest 
sound level being set to 1 00. The sound level not more 
than the lowest sound level is set to 0 while the sound 
level not less than the highest sound level is set to 1 00. 
As a result of the importance calculation processing, the 
importance at each time is calculated to be output as 
importance data. 

[01 84] FIG. 39 shows an example of a processing pro- 
cedure of a method for automatically calculating another 
importance level. FIG. 39 is established as a function 
block diagram. 

[0185] In processing of FIG. 39, it is determined that 
the scene in which important words registered in ad- 
vance in the sound attached to the video quite often ap- 
pear is important. 

[0186] In the sound recognition processing at step 
S220, when the sound data attached to the video is in- 
put, the language (words) man talks is converted into 



text data in the sound recognition processing. 
[0187] In the important word dictionary 221, words 
which are likely to appear in important scenes are reg- 
istered. If the degree of importance of registered words 
differs, the weight is added to each of the registered 
words. 

[0188] In the word collation processing at step S222, 
the text data which is an output of the sound recognition 
processing is collated with the words registered in the 
important word dictionary 221 to determine whether or 
not important words are talked. 

[0189] In the importance calculation processing at 
step S223, the importance in each scene of the video 
or at each time is calculated from the result of the word 
collation processing. In this calculation, the number of 
the appearances of important words and the weight of 
the important words are used so that the processing is 
conducted to Increase the importance around the time 
at which, for example, important words have appeared 
(or of the scene in which the important words have ap- 
peared) by a constant value, or a value proportional to 
the weight of the important words. As a result of the im- 
portant calculation processing, the importance at each 
time is calculated to be output as importance data. 
[01 90] If the weight of all the words is set to the same, 
the important word dictionary 221 becomes unneces- 
sary. This is because that it is assumed that the scene 
in which many words are spoken is important. At this 
time, in the word collation processing at step S222, the 
processing of countingthe number of words outputfrom 
the sound recognition processing is conducted. Not only 
the number of words but also the number of characters 
may be counted. 

[01 91 ] FIG . 40 shows an example of a processing pro- 
cedure of the method for automatically calculating the 
other importance level. FIG. 40 is also established as a 
function block diagram. 

[01 92] The processing of FIG. 40 determines that the 
scene in which many important words appear which are 
registered in advance in thetelop appearing in the video 
is important. 

[0193] In the teiop recognition processing at step 
S230, the character location in the video is specified to 
recognize characters by converting the video region at 
the character location into a binary value. The recog- 
nized result Is output as text data. 
[0194] The important word dictionary 231 is the same 
as the important word dictionary 221 of FIG. 39. 
[0195] In the word collation processing at step S232, 
in the same manner as at step S222 in the procedure of 
FIG. 39, the text data which is an output of the telop 
recognition processing is collated with the words regis- 
tered in the important word dictionary 231 to determine 
whether or not important words have appeared. 
[0196] In the importance calculation processing at 
step S232, the importance at each scene or at each time 
is calculated from the number of appearances of impor- 
tant words, and weight of the important words in the 
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same manner as at step S223 in the procedure of FIG. 
39. As a result of the importance calculation processing, 
the importance at each time is determined to be output 
as importance data. 

[01 97] If the weight of all the words is set to the same, 
the important word dictionary 231 becomes unneces- 
sary. This is because that it is assumed that the scene 
in which many important words appear is an important 
scene. At this time, in the word collation processing at 
step S232, processing is conducted for counting the 
number of words simply output from the telop recogni- 
tion processing. Not only the number of words but also 
the number of characters may be counted. 
[0198] FIG. 41 shows an example of a processing pro- 
cedure of a method for automatically calculating still an- 
other importance level. FIG. 41 is established as a func- 
tion block diagram. 

[0199] The processing of FIG. 41 determines that 
when the telop appearing in the video is in larger char- 
acter size, the scene is more important. 
[0200] In the telop detection processing at step S240, 
the processing is conducted for specifying the location 
of character string in the video. 

[0201] In the character size calculation processing at 
step S241 , individual characters are extracted to calcu- 
late the average value or the maximum value of the size 
(area) of the character. 

[0202] In the importance calculation processing at 
step S242, the importance is calculated which is propor- 
tional to the size of the character which is an output of 
the character size calculation processing. If the calcu- 
lated importance is too large or too small, the processing 
is conducted for restricting the importance to a preset 
range with the threshold value processing. As a result 
of the importance calculation processing, the impor- 
tance at each time is calculated to be output as impor- 
tance data. 

[0203] FIG. 42 shows an example of the processing 
procedure of a method for automatically calculating still 
another importance level. FIG. 42 is established as a 
function block diagram. 

[0204] The processing of FIG. 42 determines that the 
scene in which human faces appear in the video is im- 
portant. 

[0205] In the face detection processing at step S250, 
the processing Is conducted for detecting an area which 
looks like a human face in the video. As a result of the 
processing, the number of areas (number of faces) 
which are determined to be a human face is output. The 
information on the size (area) of the face may be output 
at the same time. 

[0206] In the importance calculation processing at 
step S251 , the number of faces which is an output of the 
processing of detecting the faces is multiplied by several 
times to calculate the importance. If the output of the 
face detection processing includes face size informa- 
tion, calculation is conducted so that the importance in- 
creases with an increase in the size of faces. For exam- 



ple, the area of the face is multiplied by several times to 
calculate the importance. As a result of the importance 
calculation processing, the importance at each time is 
calculated to be output as importance data. 
5 [0207] FIG. 43 shows an example of the processing 
procedure of a method for automatically calculating still 
other importance level. FIG. 43 is also established as a 
function block diagram. 

[0208] In the processing of FIG. 43, it is determined 

10 that the scene in which a video similar to the video which 
is registered in advance appears is important. 
[0209] The video which should be determined to be 
important is registered in the important scene dictionary 
260. The video is recorded as raw data or is recorded 

'5 jn a data compressed form. Instead of the video itself, 
the characteristic quantity (a color histogram, a frequen- 
cy or the like) of the video may be recorded. 
[0210] In the similarity/non-similarity calculation 
processing at step S261, simiiarity/non-similarity be- 

20 tween the video registered in the important scene dic- 
tionary 260 and trje input video data is calculated. As 
the non-similarity, the total of the square error or the total 
of the difference in the absolute value is used. If the vid- 
eo data is recorded in the important scene dictionary 

25 260, the total of the square error for each of the corre- 
sponding pixels and the total of the differential of the ab- 
solute valued are calculated as non-similarity. If the 
color histogram of the video is recorded in the important 
scene dictionary 260, the same color histogram is cal- 

30 culated with respect to the input video data to calculate 
the total of the square error between histograms and the 
total of the difference in the absolute values to set these 
totals as non-similarity. 

[0211] In the importance calculation processing at a 
35 step S262, the importance is calculated from the simi- 
larity/ non-similarity which is an output of the similarity 
and non-similarity calculation processing. The impor- 
tance is calculated in such a manner that larger similarity 
provides greater importance if the^simjlarfty is input 
40 while larger non-similarity provides smaller importance 
if the non-similarity is input. As a result of the importance 
calculation processing, the importance at each time is 
calculated to be output as the importance data. 
[0212] Furthermore, as another method for automat- 
es ically calculating the importance, the scene having a 
high instant viewing rate is set as an important scene. 
The data on the instant viewing rate is obtained as a 
result of the summing of the viewing rate investigation, 
so that importance is calculated by multiplying the in- 
so stant viewing rate by constant times. Needless to say, 
there are various other methods. 

[0213] The importance calculation processing may be 
solely conducted, or a plurality of data items may be 
used at the same time to calculate the importance. In 
55 the latter case, for example , the importance of one video 
is calculated with several different methods to calculate 
the final importance as an average value or a maximum 
value. 
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[0214] In the above embodiment, the explanation has 
been given by citing the scene change quantity and the 
importance. However, it is possible to use one item of 
information or a plurality of items of information (de- 
scribed in the frame information) together with the scene 
change quantity or the importance or instead of the 
scene change quantity or importance. 
[0215] Next, there will be explained a case in which 
information for the control of reproduction/non-repro- 
duction is added to the frame information (see FIG. 1 ). 
[0216] It is desired that either only a specific scene or 
a part thereof (for example, a high-light scene) or only 
a scene or a part thereof in which a specific person ap- 
pears is reproduced. Thus, there is a demand of watch- 
ing only a portion of the video. 

[0217] In order to satisfy this desire, the reproduction/ 
non-reproduction information may be added to the 
frame information for controlling the reproduction or the 
non-reproduction. As a consequence, only a part of the 
video is reproduced or only a part of the video is not 
reproduced on the basis of the reproduction/non-repro- 
duction information. 

[0218] FIGS. 44, 45, and 46 show examples of a data 
structure in which the reproduction/non-reproduction in- 
formation is added. 

[021 9] FIG. 44 shows a data structure in which the re- 
production/non-reproduction information 123 is added 
to the data structure of FIG. 8. FIG. 45 shows a data 
structure in which the reproduction/non-reproduction in- 
formation 123 is added to the data structure of FIG. 34. 
FIG. 46 shows a data structure in which the reproduc- 
tion/non-reproduction information 123 is added to the 
data structure of FIG. 35. Though not shown, it is pos- 
sible to add the reproduction/non-reproduction informa- 
tion 123 to the data structure of FIG. 1 . 
[0220] The reproduction/non-reproduction informa- 
tion 123 may be binary information specifying whether 
the video is reproduced or not or a continuous value 
such as reproduction level or the like. 
[0221] For example, in the latter case, when the re- 
production level exceeds a certain threshold value at the 
time of reproduction, the video is reproduced. When the 
reproduction level is less than the threshold value, the 
video is not reproduced. The user can directly or indi- 
rectly specify the threshold value. 
[0222] The reproduction/non -reproduction informa- 
tion 123 may be set as independent information to be 
stored. If the reproduction or non-reproduction is selec- 
tively specified, the non-reproduction can be specified 
when the display time shown in the display time infor- 
mation 121 is set to a specific value (for example, 0 or 
-1). Alternatively, the non -reproduction can be specified 
when the importance indicated by the importance infor- 
mation 122 is set to a specific value (for example, 0 or 
-1). The reproduction/non-reproduction information 123 
may not be added. 

[0223] If the reproduction or non-reproduction is spec- 
ified with a level value, the display time information 121 



• and/or the importance information 122 (represented by 
the level value) can be used as a substitute. 
[0224] If the reproduction/non-reproduction informa- 
tion 123 is maintained as independent information, the 

5 quantity of data increases by that quantity. It is possible 
to see a digest of the video by allowing the non-repro- 
duction specification portion not to be reproduced on the 
reproduction side. It is also possible to see the whole 
video by reproducing the non-reproduction specified 

10 portion. If the reproduction/ non-reproduction informa- 
tion 123 is not maintained as independent information, 
it is necessary to appropriately change the display time 
specified, for example, as 0 in order to see the whole 
video by reproducing the non-reproduction specified 

'5 portion. 

[0225] The reproduction/non-reproduction informa- 
tion 1 23 may be input by man or may be determi ned with 
some conditions. For example, when the motion infor- 
mation of the video is set to a constant value or more, 

20 the video is reproduced. When the motion information 
of the video is not^et to a constant value or more, the 
video is not reproduced so that only brisk motion portion 
can be reproduced. When it is determined that the skin 
color is larger or smaller than the constant value from 

25 color information, only the scene where man appears 
can be reproduced. A method for calculating the infor- 
mation with the magnitude of sound, and a method for 
calculating the information from the reproduction pro- 
gram information which is input in advance can be con- 

30 sidered. The importance may be calculated with some 
technique to create the reproduction/non-reproduction 
information 123 from the importance information. When 
the reproduction/non-reproduction information is set to 
a continuous value, the importance may be calculated 

35 by converting the information into the reproduction/non- 
reproduction information. 

[0226] FIG. 47 shows an example in which reproduc- 
tion/ non-reproduction control is carried out so that video 
is reproduced on the basis of the reproduction/non-re- 
40 production information 123. 

[0227] In FIG. 47, it is supposed that the original video 

2151 is reproduced on the basis of the video frame lo- 
cation information represented with F A through F 6 or the 
video frame group location information 21 53 and the dis- 

45 play time Information represented with D 1 through D 6 . 
At this time, it Is supposed that the reproduction/non- 
reproduction information is added to the display time in- 
formation 2154. In this example, the sections of D 1 , D 2 , 
D 4 and D 6 can be reproduced, and other sections cannot 

so be reproduced, the sections of D 1f D 2 , D 4 and D 6 are 
continuously reproduced as the reproduction video 

2152 (while other sections cannot be reproduced). 
[0228] For example, in the frame F ( of the reproduc- 
tion video, if the display time is set to D+, when the re- 

55 production/non-reproduction information 1 23 shows re- 
production, and the display time is set to D j when the 
reproduction/non-reproduction information 123 shows 
the non-reproduction, S^D*, = V when the total time of 
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the reproduction portion of the original video is set to T. 
Normally, the display time of D + t is set to a time which 
is required to reproduce the original video at a normal 
speed. The reproduction speed may be set to a prede- 
termined high-speed. Information may be described as 5 
to which times the speed is to be set. When it is desired 
that the video is reproduced at N times high-speed, the 
display time D + ( of the reproduction portion is multiplied 
by 1/N times. For example, in order to perform repro- 
duction at the predetermined time D\ the display time 
D + ( of each reproduction portion may be processed and 
displayed at D t fLp + i times. 

[0229] If the display time of each frame (or a frame 
group) is determined on the basis of the frame informa- 
tion, the determined display time may be adjusted. 
[0230] In a method in which the calculated display 
time is not adjusted, the display time which is calculated 
without taking into consideration the generation of the 
non-reproduction section is used as it is, so that when 
the display time exceeding 0 is originally allocated to the 
non-reproduction section the whole display time is 
shortened for that allocation portion. 
[0231] In a method in which the calculated display 
time is adjusted, for example, if the display time exceed- 
ing 0 is originally allocated to the non-reproduction sec- 
tion, the adjustment is made by multiplying by a constant 
number the display time of each of the frames (or the 
frame group) to be reproduced so that the whole display 
time becomes equal to the time at the time of the repro- 
duction of the non-reproduction section. 
[0232] The user may make a selection as to whether 
the adjustment is to be made. 

[0233] If the user specifies the N times reproduction, 
the N times high-speed reproduction processing may be 
conducted without the adjustment of the calculated dis- 
play time. The N times high-speed reproduction 
processing may be conducted on the basis of the display 
time after the adjustment of the calculated display time 
in the above manner (the display time of the former be- 
comes shorter). 

[0234] The user may specify the whole display time. 
In this case as well, for example, the display time of each 
frame (or a frame group) to be reproduced is multiplied 
by a constant number to make an adjustment so that the 
display time becomes equal to the specified whole dis- 
play time. 

[0235] FIG. 48 shows one example of the processing 
procedure for reproducing only a portion of the video on 
the basis of the reproduction/non-reproduction informa- 
tion 123. 

[0236] At step S1 62, the frame information (video lo- 
cation information and display time information) is read 
to determine whether the frame is to be reproduced from 
the reproduction/non-reproduction information in the 
display time information at step S163. 
[0237] When it is determined that the reproduction is 
to be conducted, the frame is displayed for the portion 
of the display time at step S164. When it is determined 



that the reproduction is not to be conducted, the frame 
is not displayed and the processing is moved to the next 
frame processing. 

[0238] It is determined at step S161 whether or not 
the whole video to be reproduced is processed. When 
the whole video is processed, the reproduction process- 
ing is also ended. 

[0239] When it is determined that the frame is to be 
reproduced or not at step S163, it is desired in some 
cases that the determination is depending on the taste 
of the user. At this time, it is determined from the user 
profile whether or not the non-reproduction portion is re- 
produced in advance before the reproduction of the vid- 
eo. When the n on -reproduction portion is reproduced, 
the frame is reproduced without fail at step S164. 
[0240] In addition, when the reproduction/n on- repro- 
duction information is described as a continuous value, 
a threshold value is determined from the user profile for 
differentiating the reproduction and the non-reproduc- 
tion to determine the reproduction or the non-reproduc- 
tion depending on whether or not the reproduction/non- 
reproduction information exceeds the threshold value. 
Except for using the user profile, for example, the 
threshold value is calculated from the importance set for 
each frame, or information may be received in advance 
from the user as to whether the reproduction or non-re- 
production is provided in real time. 
[0241] In this manner, it becomes possible to repro- 
duce only a portion of the video by adding to the frame 
information the reproduction/non-reproduction informa- 
tion 123 for controlling whether the video is reproduced 
or not with the result that it becomes possible to repro- 
duce only the high-light scene or only the scene in which 
a man or an object of interest appears. 
[0242] Next, there will be explained a describing 
method if the location information of media (for example, 
text or sound) other than the video associated with the 
video to be displayed, and time for displaying or repro- 
ducing the video is added to the frame information (see 
FIG. 1) as additional information. 
[0243] In FIG. 8, the video location information 101 
and the display time information 1 02 are included in 
each frame information 100. In FIG. 34, the video loca- 
tion information 1 01 and importance information 1 22 are 
included in each frame information 100. In FIG. 35, the 
video location information 1 01 , the display time informa- 
tion 121, and importance information 122 are included 
in each frame information 1 00. In FIGS. 44, 45, and 46, 
there is further shown an example in which the repro- 
duction/non-reproduction information 123 is included in 
each frame information 100. In any example, 0 or more 
sound location information 2703, sound reproduction 
time information 2704, 0 or more text information 2705 
and text display time information 2706 (however, 1 or 
more in any of the information) may be added. 
[0244] FIG. 49 shows an example in which one set of 
sound location information 2703 and sound reproduc- 
tion time information 2704 and N sets of text information 
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2705 and text display time information 2706 are added 
to an example of the data structure of FIG. 8. 
[0245] The sound is reproduced for the time indicated 
by the sound reproduction time information 2704 from 
the location indicated by the sound location information 
2703. An object of reproduction may be sound informa- 
tion attached to the video from the beginning. Back- 
ground music is created to be newly added. 
[0246] The text displays the text information indicated 
by the text information 2705 for the time indicated by the 
text display time information 2706. A plurality of items 
of text information may be added to one video frame. 
[0247] The time when the sound reproduction and the 
text display are started is the same as the time when the 
associated video frame is displayed. The sound repro- 
duction time and the text display time are set within the 
range of the associated video frame time. If continuous 
sound is reproduced over a plurality of video frames, the 
sound location information and the reproduction time 
may be set to be continuous. 

[0248] With such a method, summarized sound and 
summarized text can be made possible. 
[0249] FIG. 50 shows one example of a method for 
describing the sound information separately from the 
frame information. This is an example of a data structure 
for reproducing sound associated with the video frame 
which is displayed at the time when the special repro- 
duction is conducted. A set of the location information 
2801 showing the location of the sound to be repro- 
duced, reproduction start time 2802 when the sound re- 
production is started, and reproduction time 2803 when 
the reproduction is continued is set as one item of sound 
information 2800 to be described as an arrangement of 
this sound information. 

[0250] FIG. 51 shows a data structure for describing 
the text information. The data structure has the same 
structure as the sound information of FIG. 50, and a set 
of character code location information 2901 of the text 
to be displayed, a display start time 2902, and a display 
time 2903 is set as one item of text information 2900 to 
be described as an arrangement of this sound informa- 
tion. As information corresponding to the character code 
location information 2901 , instead of the character code 
location information 2901 , the location information may 
be used which indicates a location where the character 
code is stored, or a location where the character is 
stored as a video. 

[0251 ] The above sound information or the text infor- 
mation is synchronized with the display of the video 
frame to be displayed as information associated with the 
video frame or a constant video frame section in which 
the displayed video frame is present. As shown in FIG. 
52, the reproduction or the display of the sound infor- 
mation or the text information is started with the lapse 
of time shown by the time axis 3001 . In the beginning, 
the video 3002 is displayed and reproduced for the de- 
scribed display time in an order in which the respective 
video frames are described. Reference numerals 3005, 



* 3006 and 3007 denote respective video frames and a 
predetermined display time is allocated thereto. The 
sound 3003 is reproduced when the reproduction start 
time described in each sound information comes. When 

5 the reproduction time described in a similar manner has 
passed away, the reproduction is suspended. As shown 
in FIG. 52, a plurality of sounds 3008 and 3009 may be 
reproduced. In a similar manner as the sound, the text 
3004 is also displayed when the display time described 

10 in the each of the text information comes. When the dis- 
play time which is described has passed away, the dis- 
play is suspended. A plurality of texts 3010 and 3011 
may be displayed at the same time. 
[0252] It is not required that the sound reproduction 

15 start time and the text display start time coincides with 
the time at which the video frame is displayed. It is not 
required that the sound reproduction time and the text 
display time coincides with the display time of the video 
frame. These times can be freely set, on the contrary, 

20 the display time of the video frame may be changed in 
accordance with tjjp sound reproduction time and the 
text display time. 

[0253] It is possible that these times can be manually 
set by man. 

25 [0254] In order to omit the trouble of determination by 
man, it is preferable to determine a phenomenon which 
is likely to appear in the video scene which seems to be 
important and to automatically set these times. Herein- 
after, several examples of automatic setting are shown. 

30 [0255] FIG. 53 shows one example of a processing 
procedure in which a continuous video frame section is 
determined which is referred to as a shot from a change- 
over of the screen up to the next change-over of the 
screen, so that the total of the display time of the video 

35 frames included in the shot is defined as the sound re- 
production time. FIG. 53 is also established as a function 
block diagram. 

[0256] At step S3101, the shot is detected from the 
video. For this purpose, there are used such methods 

40 as a method for detecting a cut of a motion picture from 
the MPEG bit streams using a tolerance ratio detection 
method. (The transactions of the institute of electronics, 
information and communication engineers, Vol. J82-D- 
II, No. 3, pp. 361-370, 1999) and the like. 

45 [0257] At step S3102, the video frame location infor- 
mation is referred to thereby investigating which shot re- 
spective video frames belong to. Furthermore, the dis- 
play times of respective shots are calculated by taking 
the total of the display times of the video frames. 

50 [0258] For example, the sound location information is 
set as the sound location corresponding to the start of 
the shot. The sound reproduction start time may be al- 
lowed to coincide with the display time of the initial video 
frame which belongs to each shot while the sound re- 

55 production time may be set to be equal to the display 
time of the shot. Otherwise, in accordance with the re- 
production time of the sound, the display time of the vid- 
eo frames included in each shot may be corrected. Al- 
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though the shot is detected here, if a data structure is 
taken wherein the importance information is described 
in the frame information, the section having importance 
exceeding the threshold value is determined by using 
the importance with respect to the video frame so that 5 
the sound included in the section may be reproduced. 
[0259] If the determined reproduction time does not 
meet a constant reference, the sound may not be repro- 
duced. 

[0260] FIG. 54 shows one example of a processing 
procedure in which important words are taken out from 
sound data corresponding to the shot or the video frame 
section having the high importance with sound recogni- 
tion so that the words, or the sound including the words 
or the sound in which a plurality of words are combined 
are reproduced. FIG. 54 is also established as a function 
block diagram. 

[0261] At step S3201 , the shot is detected. In place of 
the shot, the video frame section having the high impor- 
tance is calculated. 

[0262] At step S3202, the sound recognition is carried 
out with respect to the sound data section correspond- 
ing to the obtained video frame section. 
[0263] At step S3203, sounds including the important 
word portion or sounds of the important word portion are 
determined from the recognition result. In order to select 
the important words, an important word dictionary 3204 
is referred to. 

[0264] At step S3205, the sound for reproduction is 
created. Continuous sounds including the important 
words may be used as they are. Only important words 
may be extracted. Sounds having a combination of a 
plurality of important words may be created. 
[0265] At step S3206, in accordance with the repro- 
duction time of the created time, the display time of the 
video frame is corrected. However, the number of se- 
lected words may be decreased and the reproduction 
time of the sound may be shortened so that the sound 
reproduction time is set to be within the display time of 
the video frame. 

[0266] FIG. 55 shows one example of a procedure in 
which text information is obtained from the telop. FIG. 
55 is also established as a function block diagram. 
[0267] In the processing of FIG. 55, the text informa- 
tion is obtained from the telop or the sound displayed in 
the video. 

[0268] At step S3301 , the ♦elop displayed in the video 
is read. This includes a method in which the telop in the 
original video is automatically extracted or the telop is 
read by man to be manually input with a method or the 
like described in, for example, a method described in a 
literature such as "A method for extracting the character 
portion from the video for the telop region" by Osamu 
Hori, CVIMI 114-17, pp. 129-136 (1999). 
[0269] A step S3302, important words are taken out 
from the telop character string which has been read. In 
the judgment of important words, an important word dic- 
tionary 3303 is used. The telop character string which 



is read may be text information as it is. Extracted words 
are arranged, and a sentence representing the video 
frame section may be constituted with on ly the important 
words to provide text information. 
[0270] FIG. 56 shows one example for obtaining the 
text information from the sound. FIG. 56 is also estab- 
lished as a function block diagram. 
[0271] In the sound recognition processing at step 
S3401 , sound is recognized. 

[0272] At step S3402, important words are taken out 
from the recognized sound data. In the judgment of im- 
portant words, an important word dictionary 3403 is 
used. The recognized sound data may be used as test 
information. Extracted words are arranged, and a sen- 
tence is constituted which represents the video frame 
section with only the important words to provide text in- 
formation. 

[0273] FIG. 57 shows an example of processing pro- 
cedure for taking out text information and preparing the 
text information with telop recognition from the shot or 
from the video frame section having high importance. 
FIG. 57 is also established as a function block diagram. 
[0274] At step S3501, the shot is detected from the 
video. Instead of the shot, the section having high im- 
portance may be determined. 

[0275] At step S3502, the telop represented in the vid- 
eo frame section is recognized. 

[0276] At step S3503, the important words are ex- 
tracted by using an important word dictionary 3504. 
[0277] At step S3505, text for the display is created. 
For this purpose, a telop character string including im- 
portant words may be used. Only important words or a 
character string using the important words may be used 
as text information. If text information is obtained by 
sound recognition, the telop recognition processing at 
step S3502 is subjected to sound recognition process- 
ing to input sound data. The text information is displayed 
together with the video frame in which the text is dis- 
played as telop or video frame of the time at which the 
data is reproduced as sound. Otherwise, text informa- 
tion in the video frame section may be displayed at one 
time. 

[0278] FIGS. 58A and 58B are views showing a dis- 
play example of the text information. As shown in FIG. 
58 A s the display may be divided into the text information 
display area 3601 and the video display area 3602. As 
shown in FIG. 58B, the text information may be over- 
lapped with the video display area 3603. 
[0279] Respective display times (reproduction times) 
of the video frame, the sound information and the text 
information may be adjusted so that all the media infor- 
mation is synchronized. For example, at the time of the 
double speed reproduction of the video, important 
sounds are extracted by the above method, and a half 
time sound information of the normal reproduction is ob- 
tained. Next, the display time is allocated to the video 
frame associated with respective sounds. If the display 
time of the video frame is determined so that the scene 
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change quantity becomes constant, the sound repro- 
duction time or the text display time is set to be within 
the display time of the respectively associated video 
frames. Otherwise, a section including a plurality of vid- 
eo frames is determined like the shot, so that the sound 5 
or the text included in the section is determined or dis- 
played in accordance with the display time of the sec- 
tion. 

[0280] So far there has been explained video data as 
its main focus. However, the data structure of the 
present invention can be modified to a data having no 
frame information, i.e., the sound data. It is possible to 
use sound information and text information in the form 
without the frame information. In this case, a summary 
is created which comprises only sound information or 
text information with respect to the original video data. 
In addition, a summary can be created which comprises 
only sound information and text information with respect 
to the sound data and music data. 
[0281] Though the data structures shown in FIGS. 50 
and 51 are used to describe the sound information and 
text information in synchronization with the video data, 
it is possible to summarize the sound data and text data 
only. To summarize the sound data, the data structure 
shown in FIG. 50 can be used irrespective of the video 
information. To summarize the text data, the data struc- 
ture shown in FIG. 51 can be used irrespective of the 
video information. At that time, in the same manner as 
in the case of the frame information, the original data 
information may be added to describe a correspond- 
ence relationship between the original sound and music 
data to the sound information and text information. 
[0282] FIG. 59 shows an example of a data structure 
in which the original data information 4901 is included 
in the sound information shown in FIG. 50. If the original 
data is the video, the original data information 4901 in- 
dicates the section of video frames (start point informa- 
tion 4902 and section length information 4903). 
[0283] If the original data is sound data and music da- 
ta, the original data information 4901 indicates the sec- 
tion of sound and music. 

[0284] FIG. 60 shows an example of a data structure 
in which the original data information 4901 is included 
in the sound information shown in FIG. 30. 
[0285] FIG. 61 explains an example in which sound/ 
music is summarized by using the sound information. 
The original sound/music ir divided into several sec- 
tions. A portion of the section is extracted as the sum- 
marized sound/music so that the summary of the origi- 
nal data is created. For example, a portion 5001 of the 
section 2 is extracted as summarized sound/music to 
be reproduced as a section 5002 of the summary. As an 
example of a method for dividing the section, the music 
may be divided into chapters and the conversation may 
be divided by the contents. 

[0286] Furthermore, in the same manner as in the 
case of the frame information, the description of the orig- 
inal data file and the section are included in the sound 



information and the text information with the result that 
a plurality of sound/music data items can be summa- 
rized together. At this time, if identification information 
is added to the individual original data, the original data 
identification information may be described in place of 
the original data file and the section. 
[0287] FIG. 62 explains an example in which sound/ 
music is summarized by using the sound information. 
Portions of plural sound/music data items are extracted 
as the summarized sound/music so that the summary 
of the original data is created. For example, a portion 
5001 of the sound/music data item 2 is extracted as 
summarized sound/music to be reproduced as a section 
51 02 of the summary. A piece of music included in one 
music album is extracted by a portion of the section, so 
that a summarized data for trial can be created as a us- 
age. 

[0288] If an album is summarized : the title of the music 
may be included in the music information when it is pref- 
erable that the title of the music can be known. This in- 
formation is not indispensable. 

[0289] Next, a method of providing video data will be 
explained. 

[0290] If the special reproduction control information 
created in the processing of the embodiment is provided 
for the use, it is necessary to provide the special repro- 
duction control information from the side of those who 
create the information to the side of the user with some 
means. As this method of providing the special repro- 
duction control information, various forms can be con- 
sidered as exemplified below: 

(1) Video data and special reproduction control in- 
formation are recorded on one (or a plurality of) re- 
cording medium (or media) and provided at the 
same time; 

(2) Video data is recorded on one (or a plurality of) 
recording medium (or media) and provided, and the 
special reproduction control information is sepa- 
rately recorded on one (or a plurality of) recording 
medium (media) and provided; 

(3) Video data and the special reproduction control 
information are provided via the communication 
medium at the same occasion; 

(4) Video data and the special reproduction control 
information are provided via the communication 
media at different occasions. 

[0291] According to the above described embodi- 
ments, a special reproduction control information de- 
scribing method for describing special reproduction con- 
trol information provided for special reproduction with 
respect to the video contents describes, as the frame 
information, for each of frames or groups of continuous 
or adjacent frames selectively extracted from the whole 
frame series of video data constituting the video con- 
tents, first information showing a location at which video 
data of the one frame or one group is present and see- 
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ond information associated with display time allocated 
to the one frame or the frame group, and/or third infor- 
mation showing importance allocated to the one frame 
or the frame group corresponding to the frame informa- 
tion. 

[0292] According to the above described embodi- 
ments, a computer readable recording medium storing 
a special reproduction control information stores at least 
frame information described for each of frames or 
groups of continuous or adjacent frames selectively ex- 
tracted from the whole frame series of video data con- 
stituting the video contents, the frame information com- 
prising first information showing a location at which vid- 
eo data of the one frame or one group is present and 
second information associated with display time allocat- 
ed to the one frame or the frame group, and/or third in- 
formation showing importance allocated to the one 
frame or the frame group corresponding to the frame 
information. 

[0293] According to the above described embodi- 
ments, a special reproduction control information de- 
scribing apparatus/method for describing special repro- 
duction control information provided for special repro- 
duction with respect to the video contents describes, as 
the frame information, for each of frames or groups of 
continuous or adjacent frames selectively extracted 
from the whole frame series of video data constituting 
the video contents, video location information showing 
a location at which video data of the one frame or one 
group is present and display time control information in- 
cluding display time information and basic information 
based on which the display time is calculated, to be al- 
located to the one frame or the frame group. 
[0294] According to the above described embodi- 
ments, a special reproduction apparatus/method which 
enables a special reproduction with respect to video 
contents, wherein special reproduction control informa- 
tion is referred to which includes at least frame informa- 
tion including video location information showing a lo- 
cation at which one frame data or one frame group data 
is present which information is described for each of the 
frame groups comprising one frame selectively extract- 
ed out of the whole frame series of the video data allo- 
cated to the video contents and constituting the video 
contents or a plurality of continuous or adjacent frames; 
the one frame data or the frame group data correspond- 
ing to each frame information is obtained on the basis 
of video location information included in the frame infor- 
mation while the display time which should be allocated 
to each frame information is determined on the basis of 
display time control information included in at least each 
frame information and data on the one frame or the plu- 
rality of frames which is or are obtained is reproduced 
at the determined display time in a predetermined order 
thereby carrying out a special reproduction. 
[0295] In the above described embodiments, for ex- 
ample, image data is created in advance, which is ex- 
tracted in frame units from location information on an 



effective video frame or an original video which is used 
for display, and the video frame location information or 
information on the display time of the image data is cre- 
ated separately from the original video. Either video 
frames or the image data extracted from the original vid- 
eo is continuously displayed on the basis of the display 
information so that a special reproduction such as a dou- 
ble speed reproduction, a trick reproduction, jump con- 
tinuous reproduction or the like is enabled. 
[0296] In the double speed reproduction for confirm- 
ing the contents at a high speed, display time is deter- 
mined in advance in such a manner that the display time 
is extended at a location where a motion of the scene is 
large while the display time is shortened at a location 
where the motion is small so that the change in the dis- 
play screen becomes constant as much as possible. Al- 
ternatively, the same effect can be obtained even when 
the location Information is determined so that an interval 
of the extracted location is made small at a location 
where a motion of the video frame or video data used 
for the display is large while the interval is made small 
at a location where the motion is large. A reproduction 
speed control value may be created so that a double 
speed value or a reproduction time is provided which is 
designated by a user as a whole. A long video can be 
viewed at double speed reproduction, so that the video 
can be easily viewed in a short time, and the contents 
can be grasped in a short time. 

[0297] It is possible to reproduce videos so that im- 
portant locations are not overlooked by extending the 
display time at the important locations and shortening 
the display time at unimportant locations in accordance 
with the importance of the video. 
[0298] Only important locations may be efficiently re- 
produced by partially omitting a part of the video without 
displaying the whole video frame. 
[0299] According to embodiments of the present in- 
vention, an effective special reproduction is enabled on 
the basis of the control information on the reproduction 
side by arranging and describing as control information 
provided for a special reproduction of the video contents 
a plurality of frame information including a method for 
obtaining a frame or a group of frames selectively ex- 
tracted from the original video, information on the dis- 
play time (absolute or relative value) allocated to the 
frame or the group of frames and Information which 
forms the basis for obtaining the information on the dis- 
play time. 

[0300] For example, each of the above functions can 
be realized as software. The above embodiments can 
be realized as a computer readable recording medium 
on which a program is recorded for allowing the compu- 
ter to conduct predetermined means or for allowing the 
computer to function as predetermined means, or for al- 
lowing the computer to realize a predetermined function. 
[0301] The structures shown in each of the embodi- 
ments are one example, and are not intended to exclude 
other structures. It is also possible to provide a structure 
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which is obtained by replacing a part of the structure 
exemplified above with another structure, omitting a part 
of the exemplified structure, adding a different function 
to the exemplified structure, and combining such meas- 
ures. A different structure logically equivalent to the ex- 
emplified structure, a different structure including a part 
logically equivalent to the exemplified structure, and a 
different structure logically equivalent to the essential 
portion of the exemplified structure can be provided. An- 
other structure identical to or similar to the exemplified 
structure, or a different structure having the same effect 
as the exemplified structure or a similar effect can be 
provided. 

[0302] In each of the embodiments, various variations 
with respect to various structure components can be put 
into practice in an appropriate combination. 
[0303] Each of the embodiments includes or inherent- 
ly contains an invention associated with various view- 
points, stages, concept or a category such as, for ex- 
ample, an invention as a method for describing informa- 
tion, an invention as information which is described, an 
invention as an apparatus or a method corresponding 
thereto, an invention as an inside of the apparatus or a 
method corresponding thereto. 

[0304] Consequently, the invention can be extracted 
without being limited to the exemplified structure from 
the content disclosed in the embodiment according to 
this invention. 



Claims 

1. A method of describing frame information, the 
method characterized by comprising: 

describing, for a frame extracted from a plural- 
ity of frames in a source video data, first infor- 
mation (101) specifying a location of the ex- 
tracted frame in the source video data; and 
describing, for the extracted frame, second in- 
formation (102) relating to a display time of the 
extracted frame. 

2. The method according to claim 1 , characterized in 
that the extracted frame comprises a group of 
frames, and the first information comprises informa- 
tion specifying a location of the extracted group of 
frames in the source video data. 

3. The method according to claim 1 or 2, character- 
ized by further comprising describing, for the ex- 
tracted frame, third information (122) relating to im- 
portance of the extracted frame. 

4. The method according to claim 1 , 2 or 3, charac- 
terized in that the first information comprises infor- 
mation specifying an image data file created from 
the video data of the extracted frame. 



10. The method according to any one of the preceding 
claims, characterized by further comprising de- 
scribing, for media data other than the source video 
data including the extracted frame, information 
specifying a location of the media data and informa- 
tion relating to a display time of the media data. 

11. An article of manufacture comprising a computer 
usable medium storing frame information, the frame 
information characterized by comprising: 

first information (101), described for a frame ex- 
tracted from a plurality of frames, specifying a 
location of the extracted frame in the source 
video data; and 

second information (102), described for the ex- 
tracted frame, relating to a display time of the 
extracted frame. 

12. The article of manufacture according to claim 11, 
characterized in that the extracted frame compris- 
es a group of frames, and the first information com- 
prises information specifying a location of the ex- 
tracted group of frames in the source video data. 



5. The method according to claim 1 , characterized in 
that the extracted frame comprises a frame extract- 
ed from a plurality of frames included in a temporal 
section of the source video data, and further de- 

5 scribing fourth information specifying the temporal 

section of the source video data. 

6. The method according to claim 5, characterized in 
that the first information comprises information 

10 specifying an image data file created from the 
source video data of the extracted frame, the image 
data corresponding to the extracted frame. 

7. The method according to any one of the preceding 
*5 claims, characterized in that the second informa- 
tion comprises information relating to such display 
time that a frame activity value during a special re- 
production is kept substantially constant. 

20 8. The method according to any one of the preceding 
claims, characterized by further comprising de- 
scribing fifth information (123) indicating whether 
the extracted frame is reproduced or not. 

25 9. The method according to claim 1 , characterized 
in that the first information comprises one of infor- 
mation specifying a location of the extracted frame 
among the plurality of frames and information spec- 
ifying a location of image data within an image data 

30 file created from the source video data and stored 
separately from the video data, the image data cor- 
responding to the extracted frame. 
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13. The article of manufacture according to claim 11 or 
12 t characterized in that the frame information 
comprises third information (122) relating to impor- 
tance of the extracted frame. 

14. The article of manufacture according to claim 11,12 
or 13, characterized in that the first information 
comprises information specifying an image data file 
created from the video data of the extracted frame. 

15. The article of manufacture according to claim 11, 
characterized by further comprising storing the 
source video data and an image data file corre- 
sponding to the source video data of the extracted 
frame in addition to the frame information. 

16. An apparatus for creating frame information, the ap- 
paratus characterized by comprising: 

a unit configured to extract a frame from a plu- 
rality of frames in a source video data; 
a unit configured to create the frame informa- 
tion including first information specifying a lo- 
cation of the extracted frame and second infor- 
mation relating to a display time of the extracted 
frame; and 

a unit configured to link the extracted frame to 
the frame information. 

1 7. A method of creating frame information , the method 
characterized by comprising: 

extracting a frame from a plurality of frames in 
a source video data; and 
creating the frame information including first in- 
formation specifying a location of the extracted 
frame in the source video data and second in- 
formation relating to a display time of the ex- 
tracted frame. 

1 8. An apparatus for performing a special reproduction, 
characterized by comprising: 



19. A method of performing a special reproduction 
characterized by comprising: 

referring to frame information described for a 
5 frame extracted from a plurality of frames in a 

source video data and including first informa- 
tion (101) specifying a location of the extracted 
frame and second information (102) relating to 
a display time of the extracted frame; 
10 obtaining the video data corresponding to the 

extracted frame based on the first information; 
determining the display time of the extracted 
frame based on the second information; and 
displaying the obtained video data for the de- 
15 termined display time. 

20. An article of manufacture comprising a computer 
usable medium having computer readable program 
code means embodied therein, the computer read- 

20 able program code means performing a special re- 
production, tha computer readable program code 
means characterized by comprising: 

computer readable program code means for 
25 causing a computer to refer to frame informa- 

tion described for a frame extracted from a plu- 
rality of frames in a source video data and in- 
cluding first information (1 01 ) specifying a loca- 
tion of the extracted frame and second informa- 
30 tion (102) relating to a display time of the ex- 

tracted frame; 

computer readable program code means for 
causing a computer to obtain the video data 
corresponding to the extracted frame based on 

35 the first information; 

computer readable program code means for 
causing a computer to determine the display 
time of the extracted frame based- on the sec- 
ond information; and 

40 computer readable program code means for 

causing a computer to display the obtained vid- 
eo data for the determined display time. 



a unit configured to refer to frame information 
described for a frame extracted from a plurality 
of frames in a source video data and including 
first information specifying a location of the ex- 
tracted frame in the source video data and sec- 
ond information relating to a display time of the 
extracted frame; so 
a unit configured to obtain the video data cor- 
responding to the extracted frame based on the 
first information; 

a unit configured to determine the display time 

of the extracted frame based on the second in- 55 

formation; and 

a unit configured to display the obtained video 
data for the determined display time. 



21. A method of describing sound information, the 
method characterized by comprising: 

describing, for a frame extracted from a plural- 
ity of sound frames in a source sound data, first 
information specifying a location of the extract- 
ed frame in the source sound data; and 
describing, for the extracted frame, second in- 
formation relating to a reproduction start time 
and reproduction time of the sound data of the 
extracted frame. 

22. An article of manufacture comprising a computer 
usable medium storing frame information, the frame 
information characterized by comprising: 
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first information, described for a frame extract- 
ed from a plurality of sound frames, specifying 
a location of the extracted frame in the source 
sound data; and 

second information, described for the extracted 5 
frame, relating to a reproduction start time and 
reproduction time of the sound data of the ex- 
tracted frame. 

23. A method of describing text information , the method 10 
characterized by comprising: 

describing, for a frame extracted from a plural- 
ity of text frames in a source text data, first in- 
formation specifying a location of the extracted is 
frame in the source text data; and 
describing, for the extracted frame, second in- 
formation relating to a display start time and 
display time of the text data of the extracted 
frame. 20 

24. An article of manufacture comprising a computer 
usable medium storing frame information, the frame 
information characterized by comprising: 

25 

first information, described for a frame extract- 
ed from a plurality of text frames in a source 
text data, specifying a location of the extracted 
frame in the source text data; and 
second information, described forthe extracted so 
frame, relating to a display start time and dis- 
play time of the text data of the extracted frame. 

25. A carrier medium carrying computer readable in- 
structions for controlling the computer to carry out 35 
the method of any one claims 1 to 10, 17, 19, 21 
and 23. 
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